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DNA Biochemistry and Structure 



DNA 



Deoxyribonucleic acid (DNA) is a nucleic acid that 
contains the genetic instructions used in the 
development and functioning of all known living 
organisms and some viruses. The main role of DNA 
molecules is the long-term storage of information. 
DNA is often compared to a set of blueprints or a 
recipe, or a code, since it contains the instructions 
needed to construct other components of cells, such 
as proteins and RNA molecules. The DNA segments 
that carry this genetic information are called genes, 
but other DNA sequences have structural purposes, 
or are involved in regulating the use of this genetic 
information. 

Chemically, DNA consists of two long polymers of 
simple units called nucleotides, with backbones made 
of sugars and phosphate groups joined by ester 
bonds. These two strands run in opposite directions 
to each other and are therefore anti-parallel. 
Attached to each sugar is one of four types of 
molecules called bases. It is the sequence of these 




four bases along the backbone that encodes The structure of part of a dna double 
information. This information is read using the helix 
genetic code, which specifies the sequence of the 

amino acids within proteins. The code is read by copying stretches of DNA into the related 
nucleic acid RNA, in a process called transcription. 

Within cells, DNA is organized into structures called chromosomes. These chromosomes are 
duplicated before cells divide, in a process called DNA replication. Eukaryotic organisms 
(animals, plants, fungi, and protists) store most of their DNA inside the cell nucleus and 
some of their DNA in the mitochondria. Prokaryotes (bacteria and archaea) however, store 
their DNA in the cell's cytoplasm. Within the chromosomes, chromatin proteins such as 
histones compact and organize DNA. These compact structures guide the interactions 
between DNA and other proteins, helping control which parts of the DNA are transcribed. 
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Properties 

DNA is a long polymer made 
from repeating units called 
nucleotides. [1] [2] [3] The DNA 
chain is 22 to 26 Angstroms 
wide (2.2 to 2.6 nanometres), 
and one nucleotide unit is 3.3 A 
(0.33 nm) longJ 4 ^ Although each 
individual repeating unit is very 
small, DNA polymers can be 
very large molecules containing 
millions of nucleotides. For 
instance, the largest human 
chromosome, chromosome 
number \, is approximately 220 
million base pairs longJ 5 ^ 
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The chemical structure of DNA. Hydrogen bonds are shown as 
dotted lines. 



In living organisms, DNA does 
not usually exist as a single 
molecule, but instead as a pair 
of molecules that are held 
tightly together.^ ^ These two 
long strands entwine like vines, 
in the shape of a double helix. 
The nucleotide repeats contain 
both the segment of the 
backbone of the molecule, 

which holds the chain together, and a base, which interacts with the other DNA strand in 
the helix. In general, a base linked to a sugar is called a nucleoside and a base linked to a 
sugar and one or more phosphate groups is called a nucleotide. If multiple nucleotides are 
linked together, as in DNA, this polymer is called a polynucleotide.^ 

The backbone of the DNA strand is made from alternating phosphate and sugar residues.^ 
The sugar in DNA is 2-deoxyribose, which is a pentose (five-carbon) sugar. The sugars are 
joined together by phosphate groups that form phosphodiester bonds between the third and 
fifth carbon atoms of adjacent sugar rings. These asymmetric bonds mean a strand of DNA 
has a direction. In a double helix the direction of the nucleotides in one strand is opposite 
to their direction in the other strand. This arrangement of DNA strands is called 
antiparallel. The asymmetric ends of DNA strands are referred to as the 5' (five prime) and 
3' (three prime) ends, with the 5' end being that with a terminal phosphate group and the 3' 
end that with a terminal hydroxyl group. One of the major differences between DNA and 
RNA is the sugar, with 2-deoxyribose being replaced by the alternative pentose sugar 
ribose in RNA. [7] 

The DNA double helix is stabilized by hydrogen bonds between the bases attached to the 
two strands. The four bases found in DNA are adenine (abbreviated A), cytosine (C), 
guanine (G) and thymine (T). These four bases are attached to the sugar/phosphate to form 
the complete nucleotide, as shown for adenosine monophosphate. 
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These bases are classified into two types; adenine and guanine are fused five- and 
six-membered heterocyclic compounds called purines, while cytosine and thymine are 
six-membered rings called pyrimidines.^ A fifth pyrimidine base, called uracil (U), usually 
takes the place of thymine in RNA and differs from thymine by lacking a methyl group on its 
ring. Uracil is not usually found in DNA, occurring only as a breakdown product of cytosine. 



Grooves 



Twin helical strands form the DNA backbone. Another 
double helix may be found by tracing the spaces, or 
grooves, between the strands. These voids are adjacent 
to the base pairs and may provide a binding site. As the 
strands are not directly opposite each other, the 
grooves are unequally sized. One groove, the major 
groove, is 22 A wide and the other, the minor groove, is 
12 A wideJ 11 ^ The narrowness of the minor groove 
means that the edges of the bases are more accessible 
in the major groove. As a result, proteins like 
transcription factors that can bind to specific sequences 
in double-stranded DNA usually make contacts to the 
sides of the bases exposed in the major grooveJ 12 ^ This 
situation varies in unusual conformations of DNA within 
the cell (see below), but the major and minor grooves 
are always named to reflect the differences in size that 
would be seen if the DNA is twisted back into the 
ordinary B form. 

Base pairing 




Structure of a section of DNA. The 
bases lie horizontally between the two 

spiraling strands.'" Animated 
version at File: DNA orbit animated.gif 
- over 3 megabytes. 



Each type of base on one strand forms a bond with just 
one type of base on the other strand. This is called 
complementary base pairing. Here, purines form 

hydrogen bonds to pyrimidines, with A bonding only to T, and C bonding only to G. This 
arrangement of two nucleotides binding together across the double helix is called a base 
pair. As hydrogen bonds are not covalent, they can be broken and rejoined relatively easily. 
The two strands of DNA in a double helix can therefore be pulled apart like a zipper, either 
by a mechanical force or high temperature^ 13 ^ As a result of this complementarity, all the 
information in the double-stranded sequence of a DNA helix is duplicated on each strand, 
which is vital in DNA replication. Indeed, this reversible and specific interaction between 
complementary base pairs is critical for all the functions of DNA in living organisms 
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Adenine Thymine 



Top, a GC base pair with three hydrogen bonds. Bottom, an AT base pair with two 
hydrogen bonds. Non-covalent hydrogen bonds between the pairs are shown as dashed 
lines. 

The two types of base pairs form different numbers of hydrogen bonds, AT forming two 
hydrogen bonds, and GC forming three hydrogen bonds (see figures, left). DNA with high 
GC-content is more stable than DNA with low GC-content, but contrary to popular belief, 
this is not due to the extra hydrogen bond of a GC basepair but rather the contribution of 
stacking interactions (hydrogen bonding merely provides specificity of the pairing, not 
stability) J 14 ^ As a result, it is both the percentage of GC base pairs and the overall length of 
a DNA double helix that determine the strength of the association between the two strands 
of DNA. Long DNA helices with a high GC content have stronger-interacting strands, while 
short helices with high AT content have weaker-interacting strandsJ 15 ^ In biology, parts of 
the DNA double helix that need to separate easily, such as the TATAAT Pribnow box in 
some promoters, tend to have a high AT content, making the strands easier to pull apartJ 16 ^ 
In the laboratory, the strength of this interaction can be measured by finding the 
temperature required to break the hydrogen bonds, their melting temperature (also called 
T m value). When all the base pairs in a DNA double helix melt, the strands separate and 
exist in solution as two entirely independent molecules. These single-stranded DNA 
molecules have no single common shape, but some conformations are more stable than 
others. [17] 
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Sense and antisense 

A DNA sequence is called "sense" if its sequence is the same as that of a messenger RNA 
copy that is translated into proteinJ 18 ^ The sequence on the opposite strand is called the 
"antisense" sequence. Both sense and antisense sequences can exist on different parts of 
the same strand of DNA (i.e. both strands contain both sense and antisense sequences). In 
both prokaryotes and eukaryotes, antisense RNA sequences are produced, but the functions 
of these RNAs are not entirely clearJ 19 ^ One proposal is that antisense RNAs are involved in 
regulating gene expression through RNA-RNA base pairingJ 20 ^ 

A few DNA sequences in prokaryotes and eukaryotes, and more in plasmids and viruses, 
blur the distinction between sense and antisense strands by having overlapping genesJ 21 ^ 
In these cases, some DNA sequences do double duty, encoding one protein when read along 
one strand, and a second protein when read in the opposite direction along the other 
strand. In bacteria, this overlap may be involved in the regulation of gene transcription,^ 22 ^ 
while in viruses, overlapping genes increase the amount of information that can be encoded 
within the small viral genome J 23 ^ 

Supercoiling 

DNA can be twisted like a rope in a process called DNA supercoiling. With DNA in its 
"relaxed" state, a strand usually circles the axis of the double helix once every 10.4 base 
pairs, but if the DNA is twisted the strands become more tightly or more loosely woundJ 24 ^ 
If the DNA is twisted in the direction of the helix, this is positive supercoiling, and the bases 
are held more tightly together. If they are twisted in the opposite direction, this is negative 
supercoiling, and the bases come apart more easily. In nature, most DNA has slight 
negative supercoiling that is introduced by enzymes called topoisomerasesJ 25 ^ These 
enzymes are also needed to relieve the twisting stresses introduced into DNA strands 
during processes such as transcription and DNA replication.^ 26 ^ 

Alternate DNA structures 

DNA exists in many possible 
conformations that include A-DNA, 
B-DNA, and Z-DNA forms, although, 
only B-DNA and Z-DNA have been 
directly observed in functional 
organisms.^ The conformation that 
DNA adopts depends on the 
hydration level, DNA sequence, the 
amount and direction of 
supercoiling, chemical modifications 

of the bases, the type and From left tQ right the structures of A/ B and z DNA 

concentration of metal ions, as well 
as the presence of polyamines in solutionJ 27 ^ 
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A-DNA B-DNA 



Gallery Fig.l. X-ray diffraction patterns of A- and B- DNA: the much lower quality of the B- 
DNA form X-ray pattern — with only a few 'Bragg diffraction 1 orders — shows why its 
analysis requires the -> paracrystal model approach. (X-ray patterns are courtesy of Dr. H. 
R. Wilson, F.R.S). 

The first published reports of A-DNA X-ray diffraction patterns— and also B-DNA used 
analyses based on Patterson transforms that provided only a limited amount of structural 
information for oriented fibers of DNA^ 28 ^ ^ 29 ^ An alternate analysis was then proposed by 
Wilkins et al., in 1953, for the in vivo B-DNA X-ray diffraction/scattering patterns of highly 
hydrated DNA fibers in terms of squares of Bessel functionsJ 30 ^ In the same journal, 
Watson and Crick presented their -» molecular modeling analysis of the DNA X-ray 
diffraction patterns to suggest that the structure was a double-helix 

Although the "B-DNA form 1 is most common under the conditions found in cells,^ 31 ^ it is not 
a well-defined conformation but a family of related DNA conformations^ 32 ^ that occur at the 
high hydration levels present in living cells. Their corresponding X-ray diffraction and 
scattering patterns are characteristic of molecular -> paracrystals with a significant degree 
ofdisorder. [33] [34] 

Compared to B-DNA, the A-DNA form is a wider right-handed spiral, with a shallow, wide 
minor groove and a narrower, deeper major groove. The A form occurs under 
non-physiological conditions in partially dehydrated samples of DNA, while in the cell it 
may be produced in hybrid pairings of DNA and RNA strands, as well as in enzyme-DNA 
complexesJ 35 ^ ^ 36 ^ Segments of DNA where the bases have been chemically modified by 
methylation may undergo a larger change in conformation and adopt the Z form. Here, the 
strands turn about the helical axis in a left-handed spiral, the opposite of the more common 
B formJ 37 ^ These unusual structures can be recognized by specific Z-DNA binding proteins 
and may be involved in the regulation of transcription.^ 38 ^ 
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Structure of a DNA quadruplex formed by telomere repeats. The 

looped conformation of the DNA backbone is very different from 

[39] 

the typical helical structure. 



Quadruplex structures 

At the ends of the linear 
chromosomes are specialized 
regions of DNA called telomeres. 
The main function of these regions 
is to allow the cell to replicate 
chromosome ends using the 
enzyme telomerase, as the 
enzymes that normally replicate 
DNA cannot copy the extreme 3' 
ends of chromosomes.^ 40 ^ These 
specialized chromosome caps also 
help protect the DNA ends, and 
stop the DNA repair systems in the 
cell from treating them as damage 
to be correctedJ 41 ^ In human cells, 
telomeres are usually lengths of 
single-stranded DNA containing 
several thousand repeats of a 
simple TTAGGG sequence. [42] 



These guanine-rich sequences may stabilize chromosome ends by forming structures of 
stacked sets of four-base units, rather than the usual base pairs found in other DNA 
molecules. Here, four guanine bases form a flat plate and these flat four-base units then 
stack on top of each other, to form a stable G-quadruplex structureJ 43 ^ These structures are 
stabilized by hydrogen bonding between the edges of the bases and chelation of a metal ion 
in the centre of each four-base unitJ 44 ^ Other structures can also be formed, with the 
central set of four bases coming from either a single strand folded around the bases, or 
several different parallel strands, each contributing one base to the central structure. 

In addition to these stacked structures, telomeres also form large loop structures called 
telomere loops, or T-loops. Here, the single-stranded DNA curls around in a long circle 
stabilized by telomere-binding proteinsJ 45 ^ At the very end of the T-loop, the 
single-stranded telomere DNA is held onto a region of double-stranded DNA by the 
telomere strand disrupting the double-helical DNA and base pairing to one of the two 
strands. This triple-stranded structure is called a displacement loop or D-loopJ 43 ^ 



Branched DNA 

In DNA fraying occurs when non-complementary regions exist at the end of an otherwise 
complementary double-strand of DNA. However, branched DNA can occur if a third strand 
of DNA is introduced and contains adjoining regions able to hybridize with the frayed 
regions of the pre-existing double-strand. Although the simplest example of branched DNA 
involves only three strands of DNA, complexes involving additional strands and multiple 
branches are also possibleJ 46 ^ 
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A DNA structure with multiple branches. 



Chemical modifications 
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Structure of cytosine with and without the 5-methyl group. After deamination the 
5-methylcytosine has the same structure as thymine 



Base modifications 

The expression of genes is influenced by how the DNA is packaged in chromosomes, in a 
structure called chromatin. Base modifications can be involved in packaging, with regions 
that have low or no gene expression usually containing high levels of methylation of 
cytosine bases. For example, cytosine methylation, produces 5-methylcytosine, which is 
important for X-chromosome inactivationJ 47 ^ The average level of methylation varies 
between organisms - the worm Caenorhabditis elegans lacks cytosine methylation, while 
vertebrates have higher levels, with up to 1% of their DNA containing 5-methylcytosine J 48 ^ 
Despite the importance of 5-methylcytosine, it can deaminate to leave a thymine base, 
methylated cytosines are therefore particularly prone to mutations J 49 ^ Other base 
modifications include adenine methylation in bacteria, the presence of 
5-hydroxymethylcytosine in the brain,^ 50 ^ and the glycosylation of uracil to produce the 
"J-base" in kinetoplastids. [51] [52] 
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Damage 



DNA can be damaged by many different 
sorts of mutagens, which change the DNA 
sequence. Mutagens include oxidizing 
agents, alkylating agents and also 
high-energy electromagnetic radiation such 
as ultraviolet light and X-rays. The type of 
DNA damage produced depends on the 
type of mutagen. For example, UV light can 
damage DNA by producing thymine dimers, 
which are cross-links between pyrimidine 
basesJ 54 ^ On the other hand, oxidants such 
as free radicals or hydrogen peroxide 
produce multiple forms of damage, 
including base modifications, particularly of 
guanosine, and double-strand breaksJ 55 ^ A 
typical human cell contains about 150,000 
bases that have suffered oxidative 
damageJ 56 ^ Of these oxidative lesions, the 
most dangerous are double-strand breaks, 
as these are difficult to repair and can 
produce point mutations, insertions and 
deletions from the DNA sequence, as well 
as chromosomal translocations.^ 57 ^ 




A covalent adduct between benzo[a]pyrene, the major 
mutagen in tobacco smoke, and DNA^^ 



Many mutagens fit into the space between two adjacent base pairs, this is called 
intercalating. Most intercalators are aromatic and planar molecules, and include Ethidium 
bromide, daunomycin, and doxorubicin. In order for an intercalator to fit between base 
pairs, the bases must separate, distorting the DNA strands by unwinding of the double 
helix. This inhibits both transcription and DNA replication, causing toxicity and mutations. 
As a result, DNA intercalators are often carcinogens, and Benzo[a]pyrene diol epoxide, 
acridines, aflatoxin and ethidium bromide are well-known examplesJ 58 ^ ^ 59 ^ ^ 60 ^ 
Nevertheless, due to their ability to inhibit DNA transcription and replication, other similar 
toxins are also used in chemotherapy to inhibit rapidly growing cancer cellsJ 61 ^ 



Biological functions 

DNA usually occurs as linear chromosomes in eukaryotes, and circular chromosomes in 
prokaryotes. The set of chromosomes in a cell makes up its genome; the human genome has 
approximately 3 billion base pairs of DNA arranged into 46 chromosomes.^ 62 ^ The 
information carried by DNA is held in the sequence of pieces of DNA called genes. 
Transmission of genetic information in genes is achieved via complementary base pairing. 
For example, in transcription, when a cell uses the information in a gene, the DNA 
sequence is copied into a complementary RNA sequence through the attraction between 
the DNA and the correct RNA nucleotides. Usually, this RNA copy is then used to make a 
matching protein sequence in a process called translation which depends on the same 
interaction between RNA nucleotides. Alternatively, a cell may simply copy its genetic 
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information in a process called DNA replication. The details of these functions are covered 
in other articles; here we focus on the interactions between DNA and other molecules that 
mediate the function of the genome. 

Genes and genomes 

Genomic DNA is located in the cell nucleus of eukaryotes, as well as small amounts in 
mitochondria and chloroplasts. In prokaryotes, the DNA is held within an irregularly shaped 
body in the cytoplasm called the nucleoidJ 63 ^ The genetic information in a genome is held 
within genes, and the complete set of this information in an organism is called its genotype. 
A gene is a unit of heredity and is a region of DNA that influences a particular 
characteristic in an organism. Genes contain an open reading frame that can be 
transcribed, as well as regulatory sequences such as promoters and enhancers, which 
control the transcription of the open reading frame. 

In many species, only a small fraction of the total sequence of the genome encodes protein. 
For example, only about 1.5% of the human genome consists of protein-coding exons, with 
over 50% of human DNA consisting of non-coding repetitive sequences J 64 ^ The reasons for 
the presence of so much non-coding DNA in eukaryotic genomes and the extraordinary 
differences in genome size, or C-value, among species represent a long-standing puzzle 
known as the "C-value enigma. "^ 65 ^ However, DNA sequences that do not code protein may 
still encode functional non-coding RNA molecules, which are involved in the regulation of 
gene expressions 66 ^ 

Some non-coding DNA sequences 
play structural roles in 
chromosomes. Telomeres and 
centromeres typically contain few 
genes, but are important for the 
function and stability of 
chromosomes.^ 41 ^ ^ 68 ^ An abundant 
form of non-coding DNA in humans 
are pseudogenes, which are copies 
of genes that have been disabled 
by mutation J 69 ^ These sequences 
are usually just molecular fossils, 
although they can occasionally 
serve as raw genetic material for 
the creation of new genes through 
the process of gene duplication 




Transcription and translation 

A gene is a sequence of DNA that contains genetic information and can influence the 
phenotype of an organism. Within a gene, the sequence of bases along a DNA strand 
defines a messenger RNA sequence, which then defines one or more protein sequences. 
The relationship between the nucleotide sequences of genes and the amino-acid sequences 
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of proteins is determined by the rules of translation, known collectively as the genetic code. 
The genetic code consists of three-letter 'words' called codons formed from a sequence of 
three nucleotides (e.g. ACT, CAG, TTT). 

In transcription, the codons of a gene are copied into messenger RNA by RNA polymerase. 
This RNA copy is then decoded by a ribosome that reads the RNA sequence by base-pairing 
the messenger RNA to transfer RNA, which carries amino acids. Since there are 4 bases in 
3-letter combinations, there are 64 possible codons ( 4 J combinations). These encode the 
twenty standard amino acids, giving most amino acids more than one possible codon. There 
are also three 'stop' or 'nonsense' codons signifying the end of the coding region; these are 
the TAA, TGA and TAG codons. 

Replication 

Cell division is essential for an 
organism to grow, but when a 
cell divides it must replicate 
the DNA in its genome so that 
the two daughter cells have 
the same genetic information 
as their parent. The 
double-stranded structure of 
DNA provides a simple 
mechanism for DNA 
replication. Here, the two 
strands are separated and 
then each strand's 

complementary DNA sequence 
is recreated by an enzyme called DNA polymerase. This enzyme makes the complementary 
strand by finding the correct base through complementary base pairing, and bonding it 
onto the original strand. As DNA polymerases can only extend a DNA strand in a 5' to 3' 
direction, different mechanisms are used to copy the antiparallel strands of the double 
helix. [71] In this way, the base on the old strand dictates which base appears on the new 
strand, and the cell ends up with a perfect copy of its DNA. 

Interactions with proteins 

All the functions of DNA depend on interactions with proteins. These protein interactions 
can be non-specific, or the protein can bind specifically to a single DNA sequence. Enzymes 
can also bind to DNA and of these, the polymerases that copy the DNA base sequence in 
transcription and DNA replication are particularly important. 




Binding proteins 

DNA replication. The double helix is unwound by a helicase and 
topoisomerase. Next, one DNA polymerase produces the leading 
strand copy. Another DNA polymerase binds to the lagging strand. 
This enzyme makes discontinuous segments (called Okazaki 
fragments) before DNA ligase joins them together. 
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DNA-binding proteins 



Interaction of DNA with histones (shown in white, top). These proteins' basic amino acids 
(below left, blue) bind to the acidic phosphate groups on DNA (below right, red). 

Structural proteins that bind DNA are well-understood examples of non-specific 
DNA-protein interactions. Within chromosomes, DNA is held in complexes with structural 
proteins. These proteins organize the DNA into a compact structure called chromatin. In 
eukaryotes this structure involves DNA binding to a complex of small basic proteins called 
histones, while in prokaryotes multiple types of proteins are involvedJ 72 ^ ^ 73 ^ The histones 
form a disk-shaped complex called a nucleosome, which contains two complete turns of 
double-stranded DNA wrapped around its surface. These non-specific interactions are 
formed through basic residues in the histones making ionic bonds to the acidic 
sugar-phosphate backbone of the DNA, and are therefore largely independent of the base 
sequenceJ 74 ^ Chemical modifications of these basic amino acid residues include 
methylation, phosphorylation and acetylationJ 75 ^ These chemical changes alter the strength 
of the interaction between the DNA and the histones, making the DNA more or less 
accessible to transcription factors and changing the rate of transcription.^ 76 ^ Other 
non-specific DNA-binding proteins in chromatin include the high-mobility group proteins, 
which bind to bent or distorted DNA^ 77 ^ These proteins are important in bending arrays of 
nucleosomes and arranging them into the larger structures that make up chromosomes.^ 78 ^ 

A distinct group of DNA-binding proteins are the DNA-binding proteins that specifically 
bind single-stranded DNA. In humans, replication protein A is the best-understood member 
of this family and is used in processes where the double helix is separated, including DNA 
replication, recombination and DNA repair J 79 ^ These binding proteins seem to stabilize 
single-stranded DNA and protect it from forming stem-loops or being degraded by 
nucleases. 
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In contrast, other proteins have evolved to bind to 
particular DNA sequences. The most intensively 
studied of these are the various transcription factors, 
which are proteins that regulate transcription. Each 
transcription factor binds to one particular set of DNA 
sequences and activates or inhibits the transcription of 
genes that have these sequences close to their 
promoters. The transcription factors do this in two 
ways. Firstly, they can bind the RNA polymerase 
responsible for transcription, either directly or through 
other mediator proteins; this locates the polymerase at 
the promoter and allows it to begin transcription } 8 ^ 
Alternatively, transcription factors can bind enzymes 
that modify the histones at the promoter; this will 
change the accessibility of the DNA template to the 
polymerases 82 ^ 

As these DNA targets can occur throughout an 
organism's genome, changes in the activity of one type 
of transcription factor can affect thousands of 
genesJ 83 ^ Consequently, these proteins are often the 

targets of the signal transduction processes that control responses to environmental 
changes or cellular differentiation and development. The specificity of these transcription 
factors' interactions with DNA come from the proteins making multiple contacts to the 
edges of the DNA bases, allowing them to "read" the DNA sequence. Most of these 




The lambda repressor helix-turn-helix 
transcription factor bound to its DNA 
target'" 8 ^ 



base-interactions are made in the major groove, where the bases are most accessible. 



[84] 




DNA- modifying enzymes 



Nucleases and ligases 

Nucleases are enzymes that cut DNA 
strands by catalyzing the hydrolysis of the 
phosphodiester bonds. Nucleases that 
hydrolyse nucleotides from the ends of 
DNA strands are called exonucleases, 
while endonucleases cut within strands. 
The most frequently used nucleases in 
molecular biology are the restriction 
endonucleases, which cut DNA at specific 
sequences. For instance, the EcoRV 
enzyme shown to the left recognizes the 
6-base sequence 5'-GAT|ATC-3' and makes a cut at the vertical line. In nature, these 
enzymes protect bacteria against phage infection by digesting the phage DNA when it 
enters the bacterial cell, acting as part of the restriction modification systemJ 86 ^ In 
technology, these sequence-specific nucleases are used in molecular cloning and DNA 
fingerprinting. 



The restriction enzyme EcoRV (green) in a complex 



with its substrate DNA 
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Enzymes called DNA ligases can rejoin cut or broken DNA strandsJ 87 ^ Ligases are 
particularly important in lagging strand DNA replication, as they join together the short 
segments of DNA produced at the replication fork into a complete copy of the DNA 
template. They are also used in DNA repair and genetic recombination.^ 87 ^ 

Topoisomerases and helicases 

Topoisomerases are enzymes with both nuclease and ligase activity. These proteins change 
the amount of supercoiling in DNA. Some of these enzyme work by cutting the DNA helix 
and allowing one section to rotate, thereby reducing its level of supercoiling; the enzyme 
then seals the DNA breakJ 25 ^ Other types of these enzymes are capable of cutting one DNA 
helix and then passing a second strand of DNA through this break, before rejoining the 
helix J 88 ^ Topoisomerases are required for many processes involving DNA, such as DNA 
replication and transcription.^ 26 ^ 

Helicases are proteins that are a type of molecular motor. They use the chemical energy in 
nucleoside triphosphates, predominantly ATP, to break hydrogen bonds between bases and 
unwind the DNA double helix into single strandsJ 89 ^ These enzymes are essential for most 
processes where enzymes need to access the DNA bases. 

Polymerases 

Polymerases are enzymes that synthesize polynucleotide chains from nucleoside 
triphosphates. The sequence of their products are copies of existing polynucleotide chains - 
which are called templates. These enzymes function by adding nucleotides onto the 3' 
hydroxyl group of the previous nucleotide in a DNA strand. Consequently, all polymerases 
work in a 5' to 3' directionJ 90 ^ In the active site of these enzymes, the incoming nucleoside 
triphosphate base-pairs to the template: this allows polymerases to accurately synthesize 
the complementary strand of their template. Polymerases are classified according to the 
type of template that they use. 

In DNA replication, a DNA-dependent DNA polymerase makes a copy of a DNA sequence. 
Accuracy is vital in this process, so many of these polymerases have a proofreading activity. 
Here, the polymerase recognizes the occasional mistakes in the synthesis reaction by the 
lack of base pairing between the mismatched nucleotides. If a mismatch is detected, a 3' to 
5' exonuclease activity is activated and the incorrect base removedJ 91 ^ In most organisms 
DNA polymerases function in a large complex called the replisome that contains multiple 
accessory subunits, such as the DNA clamp or helicases J 92 ^ 

RNA-dependent DNA polymerases are a specialized class of polymerases that copy the 
sequence of an RNA strand into DNA. They include reverse transcriptase, which is a viral 
enzyme involved in the infection of cells by retroviruses, and telomerase, which is required 
for the replication of telomeres.^ ^ 93 ^ Telomerase is an unusual polymerase because it 
contains its own RNA template as part of its structure J 41 ^ 

Transcription is carried out by a DNA-dependent RNA polymerase that copies the sequence 
of a DNA strand into RNA. To begin transcribing a gene, the RNA polymerase binds to a 
sequence of DNA called a promoter and separates the DNA strands. It then copies the gene 
sequence into a messenger RNA transcript until it reaches a region of DNA called the 
terminator, where it halts and detaches from the DNA. As with human DNA-dependent DNA 
polymerases, RNA polymerase II, the enzyme that transcribes most of the genes in the 
human genome, operates as part of a large protein complex with multiple regulatory and 
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accessory subunitsJ 94 ^ 



Genetic recombination 





Structure of the Holliday junction intermediate in genetic recombination. The four separate 
DNA strands are coloured red, blue, green and yellowJ 95 ^ 

A DNA helix usually does not interact with 
other segments of DNA, and in human cells 
the different chromosomes even occupy 
separate areas in the nucleus called 
"chromosome territories".^ 96 ^ This physical 
separation of different chromosomes is 
important for the ability of DNA to function 
as a stable repository for information, as 
one of the few times chromosomes interact 
is during chromosomal crossover when 
they recombine. Chromosomal crossover is 
when two DNA helices break, swap a 
section and then rejoin. 
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Recombination involves the breakage and rejoining of 
two chromosomes (M and F) to produce two 
re-arranged chromosomes (CI and C2). 



Recombination allows chromosomes to exchange genetic information and produces new 
combinations of genes, which increases the efficiency of natural selection and can be 
important in the rapid evolution of new proteinsJ 97 ^ Genetic recombination can also be 
involved in DNA repair, particularly in the cell's response to double-strand breaksJ 98 ^ 

The most common form of chromosomal crossover is homologous recombination, where the 
two chromosomes involved share very similar sequences. Non-homologous recombination 
can be damaging to cells, as it can produce chromosomal translocations and genetic 
abnormalities. The recombination reaction is catalyzed by enzymes known as recombinases, 
such as RAD51 } 99 ^ The first step in recombination is a double-stranded break either caused 
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by an endonuclease or damage to the DNA. J A series of steps catalyzed in part by the 
recombinase then leads to joining of the two helices by at least one Holliday junction, in 
which a segment of a single strand in each helix is annealed to the complementary strand in 
the other helix. The Holliday junction is a tetrahedral junction structure that can be moved 
along the pair of chromosomes, swapping one strand for another. The recombination 
reaction is then halted by cleavage of the junction and re-ligation of the released DNA^ 101 ^ 



Evolution 

DNA contains the genetic information that allows all modern living things to function, grow 
and reproduce. However, it is unclear how long in the 4-billion-year history of life DNA has 
performed this function, as it has been proposed that the earliest forms of life may have 
used RNA as their genetic materialJ 90 ^ ^ 102 ^ RNA may have acted as the central part of early 
cell metabolism as it can both transmit genetic information and carry out catalysis as part 
of ribozymesJ 103 ^ This ancient RNA world where nucleic acid would have been used for 
both catalysis and genetics may have influenced the evolution of the current genetic code 
based on four nucleotide bases. This would occur since the number of unique bases in such 
an organism is a trade-off between a small number of bases increasing replication accuracy 
and a large number of bases increasing the catalytic efficiency of ribozymesJ 104 ^ 

Unfortunately, there is no direct evidence of ancient genetic systems, as recovery of DNA 
from most fossils is impossible. This is because DNA will survive in the environment for less 
than one million years and slowly degrades into short fragments in solutionJ 105 ^ Claims for 
older DNA have been made, most notably a report of the isolation of a viable bacterium 
from a salt crystal 250-million years old,^ 106 ^ but these claims are controversial^ 107 ^ ^ 108 ^ 



Uses in technology 

Genetic engineering 

Methods have been developed to purify DNA from organisms, such as phenol-chloroform 
extraction and manipulate it in the laboratory, such as restriction digests and the 
polymerase chain reaction. Modern biology and biochemistry make intensive use of these 
techniques in recombinant DNA technology. Recombinant DNA is a man-made DNA 
sequence that has been assembled from other DNA sequences. They can be transformed 
into organisms in the form of plasmids or in the appropriate format, by using a viral 
vector } 109 ^ The genetically modified organisms produced can be used to produce products 
such as recombinant proteins, used in medical research,^ 110 ^ or be grown in agriculture J 11 ^ 

[112] 



Forensics 

Forensic scientists can use DNA in blood, semen, skin, saliva or hair found at a crime scene 
to identify a matching DNA of an individual, such as a perpetrator. This process is called 
genetic fingerprinting, or more accurately, DNA profiling. In DNA profiling, the lengths of 
variable sections of repetitive DNA, such as short tandem repeats and minisatellites, are 
compared between people. This method is usually an extremely reliable technique for 
identifying a matching DNaJ 113 ^ However, identification can be complicated if the scene is 
contaminated with DNA from several people J 114 ^ DNA profiling was developed in 1984 by 
British geneticist Sir Alec Jeffreys,^ 1 15 ^ and first used in forensic science to convict Colin 
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Pitchfork in the 1988 Enderby murders case. J 

People convicted of certain types of crimes may be required to provide a sample of DNA for 
a database. This has helped investigators solve old cases where only a DNA sample was 
obtained from the scene. DNA profiling can also be used to identify victims of mass casualty 
incidentsJ 117 ^ On the other hand, many convicted people have been released from prison on 
the basis of DNA techniques, which were not available when a crime had originally been 
committed. 



Bioinformatics 

Bioinformatics involves the manipulation, searching, and data mining of DNA sequence 
data. The development of techniques to store and search DNA sequences have led to widely 
applied advances in computer science, especially string searching algorithms, machine 
learning and database theoryJ 118 ^ String searching or matching algorithms, which find an 
occurrence of a sequence of letters inside a larger sequence of letters, were developed to 
search for specific sequences of nucleotides^ 119 ^ In other applications such as text editors, 
even simple algorithms for this problem usually suffice, but DNA sequences cause these 
algorithms to exhibit near-worst-case behaviour due to their small number of distinct 
characters. The related problem of sequence alignment aims to identify homologous 
sequences and locate the specific mutations that make them distinct. These techniques, 
especially multiple sequence alignment, are used in studying phylogenetic relationships and 
protein functionJ 120 ^ Data sets representing entire genomes' worth of DNA sequences, such 
as those produced by the Human Genome Project, are difficult to use without annotations, 
which label the locations of genes and regulatory elements on each chromosome. Regions 
of DNA sequence that have the characteristic patterns associated with protein- or 
RNA-coding genes can be identified by gene finding algorithms, which allow researchers to 
predict the presence of particular gene products in an organism even before they have been 
isolated experimentally^ 121 ^ 



DNA nanotechnology 



DNA nanotechnology uses the 
unique molecular recognition 
properties of DNA and other 
nucleic acids to create 
self-assembling branched DNA 
complexes with useful 
properties. [123] DNA is thus 
used as a structural material 
rather than as a carrier of 
biological information. This 
has led to the creation of 
two-dimensional periodic 
lattices (both tile-based as well 
as using the "DNA origami" 
method) as well as 




The DNA structure at left (schematic shown) will self-assemble into 
the structure visualized by atomic force microscopy at right. DNA 
nanotechnology is the field which seeks to design nanoscale structures 
using the molecular recognition properties of DNA molecules. Image 
from Strong, 2004.[122] 
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three-dimensional structures in the shapes of polyhedraJ 124 ^ Nanomechanical devices and 
algorithmic self-assembly have also been demonstrated,^ 125 ^ and these DNA structures have 
been used to template the arrangement of other molecules such as gold nanoparticles and 
streptavidin proteins. [126] 

History and anthropology 

Because DNA collects mutations over time, which are then inherited, it contains historical 
information and by comparing DNA sequences, geneticists can infer the evolutionary 
history of organisms, their phylogenyJ 127 ^ This field of phylogenetics is a powerful tool in 
evolutionary biology. If DNA sequences within a species are compared, population 
geneticists can learn the history of particular populations. This can be used in studies 
ranging from ecological genetics to anthropology; for example, DNA evidence is being used 
to try to identify the Ten Lost Tribes of Israel. [128] [129] 

DNA has also been used to look at modern family relationships, such as establishing family 
relationships between the descendants of Sally Hemings and Thomas Jefferson. This usage 
is closely related to the use of DNA in criminal investigations detailed above. Indeed, some 
criminal investigations have been solved when DNA from crime scenes has matched 
relatives of the guilty individuals 130 ^ 

History of DNA research 

DNA was first isolated by the Swiss physician Friedrich Miescher who, in 1869, discovered 
a microscopic substance in the pus of discarded surgical bandages. As it resided in the 
nuclei of cells, he called it "nuclein" } 131 ^ in 1919, Phoebus Levene identified the base, 
sugar and phosphate nucleotide unitJ 132 ^ Levene suggested that DNA consisted of a string 
of nucleotide units linked together through the phosphate groups. However, Levene 
thought the chain was short and the bases repeated in a fixed order. In 1937 William 
Astbury produced the first X-ray diffraction patterns that showed that DNA had a regular 
structure J 1 33 ^ 

In 1928, Frederick Griffith discovered that traits of the "smooth" form of the Pneumococcus 
could be transferred to the "rough" form of the same bacteria by mixing killed "smooth" 
bacteria with the live "rough" formJ 134 ^ This system provided the first clear suggestion that 
DNA carried genetic information— the Avery-MacLeod-McCarty experiment— when Oswald 
Avery, along with coworkers Colin MacLeod and Maclyn McCarty, identified DNA as the 
transforming principle in 1943J 135 ^ DNA's role in heredity was confirmed in 1952, when 
Alfred Hershey and Martha Chase in the Hershey-Chase experiment showed that DNA is 
the genetic material of the T2 phage J 136 ^ 
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DNA Helix controversy 



Raymond Gosling 



In 1953 James D. Watson and Francis Crick suggested what is now accepted as the first 
correct double-helix model of DNA structure in the journal Nature} 6 ^ Their double-helix, 
molecular model of DNA was then based on a single X-ray diffraction image (labeled as 
"Photo 51 ")^ 137 ^ taken by Rosalind Franklin and Raymond Gosling in May 1952, as well as 
the information that the DNA bases were paired— also obtained through private 
communications from Erwin Chargaff in the previous years. Chargaffs rules played a very 
important role in establishing double-helix configurations for B-DNA as well as A-DNA. 

Experimental evidence supporting the Watson and Crick model were published in a series 
of five articles in the same issue of Nature} 138 ^ Of these, Franklin and Gosling's paper was 
the first publication of their own X-ray diffraction data and original analysis method that 
partially supported the Watson and Crick model^ 29 ^ ^ 139 ^ ; this issue also contained an article 
on DNA structure by Maurice Wilkins and two of his colleagues, whose analysis and in vivo 
B-DNA X-ray patterns also supported the presence in vivo of the double-helical DNA 
configurations as proposed by Crick and Watson for their double-helix molecular model of 
DNA in the previous two pages of Nature} 30 ^ In 1962, after Franklin's death, Watson, Crick, 
and Wilkins jointly received the Nobel Prize in Physiology or MedicineJ 140 ^ Unfortunately, 
Nobel rules of the time allowed only living recipients, but a vigorous debate continues on 
who should receive credit for the discovery.^ 141 ^ 

In an influential presentation in 1957, Crick laid out the "Central Dogma" of molecular 
biology, which foretold the relationship between DNA, RNA, and proteins, and articulated 
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the "adaptor hypothesis". J Final confirmation of the replication mechanism that was 
implied by the double-helical structure followed in 1958 through the Meselson-Stahl 
experiments 143 ^ Further work by Crick and coworkers showed that the genetic code was 
based on non-overlapping triplets of bases, called codons, allowing Har Gobind Khorana, 
Robert W. Holley and Marshall Warren Nirenberg to decipher the genetic codeJ 144 ^ These 
findings represent the birth of molecular biology. 
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ISBN 0-12-155089-3. 
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• Olby, Robert C. (1994). The path to the double helix: the discovery of DNA. New York: 
Dover Publications. ISBN 0-486-68117-3., first published in October 1974 by MacMillan, 
with foreword by Francis Crick; the definitive DNA textbook, revised in 1994 with a 9 page 
postscript. 

• Olby, Robert C. (2009). Francis Crick: A Biography. Plainview, N.Y: Cold Spring Harbor 
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• Ridley, Matt (2006). Francis Crick: discoverer of the genetic code. [Ashland, OH: Eminent 
Lives, Atlas Books. ISBN 0-06-082333-X. 

• Berry, Andrew; Watson, James D. (2003). DNA: the secret of life. New York: Alfred A. 
Knopf. ISBN 0-375-41546-7. 

• Stent, Gunther Siegmund; Watson, James D. (1980). The double helix: a personal account 
of the discovery of the structure of DNA. New York: Norton. ISBN 0-393-95075-1. 

• Wilkins, Maurice (2003). The third man of the double helix the autobiography of Maurice 
Wilkins. Cambridge, Eng: University Press. ISBN 0-19-860665-6. 

External links 

• DNA (http://www.dmoz.org/Science/Biology/Biochemistry_and_Molecular_Biology/ 
Biomolecules/Nucleic_Acids/DNA//) at the Open Directory Project 

• DNA binding site prediction on protein (http://pipe.scs.fsu.edu/displar.html) 

• DNA coiling to form chromosomes (http://biostudio.com/c_ education mac. htm) 

• DNA from the Beginning (http://www.dnaftb.org/dnaftb/) Another DNA Learning 
Center site on DNA, genes, and heredity from Mendel to the human genome project. 

• DNA Lab, demonstrates how to extract DNA from wheat using readily available 
equipment and supplies. (http://ca.youtube.com/watch7vHyb7fwduuGM) 

• DNA the Double Helix Game (http://nobelprize.org/educational_games/medicine/ 
dna_double_helix/) From the official Nobel Prize web site 
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• DNA under electron microscope (http://www.fidelitysystems.com/Unlinked_DNA.html) 

• Dolan DNA Learning Center (http://www.dnalc.org/) 

• Double Jelix: 50 years of DNA (http://www.nature.com/nature/dna50/archive.html), 
Nature 

• Double Helix 1953-2003 (http://www.ncbe.reading.ac.uk/DNA50/) National Centre 
for Biotechnology Education 

• Francis Crick and James Watson talking on the BBC in 1962, 1972, and 1974 (http:// 
www.bbc.co.uk/bbcfour/audiointerviews/profilepages/crickwatsonl.shtml) 

• Genetic Education Modules for Teachers (http://www.genome.gov/10506718) — DNA 
from the Beginning Study Guide 

• Guide to DNA cloning (http://www.blackwellpublishing.com/trun/artwork/Animations/ 
cloningexp/cloningexp.html) 

• Olby R (January 2003). " Quiet debut for the double helix (http://chem-faculty.ucsd.edu/ 
joseph/CHEM13/DNAl.pdf) M . Nature 421 (6921): 402-5. doi: 10.1038/nature01397 
(http://dx.doi.org/10.1038/nature01397). PMID 12540907. http://chem-faculty.ucsd. 
edu/joseph/CHEM13/DNAl.pdf. 

• PDB Molecule of the Month pdb23_l (http://www.rcsb.org/pdb/static. 
do?p=education_discussion/molecule_of_the_month/pdb23_l . html) 

• Rosalind Franklin's contributions to the study of DNA (http://mason.gmu.edu/ 
~emoody/rf ranklin.html) 

• The Register of Francis Crick Personal Papers 1938 - 2007 (http://orpheus.ucsd.edu/ 
speccoll/testing/html/mss0660a.html#abstract) at Mandeville Special Collections 
Library, Geisel Library, University of California, San Diego 

• The Secret Life of DNA - DNA Music compositions (http://www.tjmitchell.com/stuart/ 
dna.html) 

• U.S. National DNA Day (http://www.genome.gov/10506367) — watch videos and 
participate in real-time chat with top scientists 

• 11 Clue to chemistry of heredity found (http://www.nytimes.com/packages/pdf/science/ 
dna-article.pdf)". The New York Times. Saturday, June 13, 1953. http://www.nytimes. 
com/packages/pdf/science/dna-article.pdf. The first American newspaper coverage of 
the discovery of the DNA structure. 
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Paracrystalline 

Paracrystalline materials are defined as having short and medium range ordering in their 
lattice (similar to the liquid crystal phases) but lacking long-range ordering at least in one 
direction }^ 

Ordering is the regularity in which atoms appear in a predictable lattice, as measured from 
one point. In a highly ordered, perfectly crystalline material, or single crystal, the location 
of every atom in the structure can be described exactly measuring out from a single origin. 
Conversely, in a disordered structure such as a liquid or amorphous solid, the location of 
the first and perhaps second nearest neighbors can be described from an origin (with some 
degree of uncertainty) and the ability to predict locations decreases rapidly from there out. 
The distance at which atom locations can be predicted is referred to as the correlation 
length £ . A paracrystalline material exhibits correlation somewhere between the fully 
amorphous and fully crystalline. 

The primary, most accessible source of crystallinity information is X-ray diffraction, 
although other techniques may be needed to observe the complex structure of 
paracrystalline materials, such as fluctuation electron microscopy ^ in combination with 
Density of states modeling^ of electronic and vibrational states. 



Paracrystalline Model 

The paracrystalline model is a revision of the Continuous Random Network model first 
proposed by W. H. Zachariasen in 1932 ^ . The paracrystal model is defined as highly 
strained, microcrystalline grains surrounded by fully amorphous material ^ . This is a 
higher energy state then the continuous random network model. The important distinction 
between this model and the microcrystalline phases is the lack of defined grain boundaries 
and highly strained lattice parameters, which makes calculations of molecular and lattice 
dynamics difficult. A general theory of paracrystals has been formulated in a basic 
textbook^ , and then further developed/refined by various authors. 



Applications 

The paracrystal model has been useful, for example, in describing the state of partially 
amorphous semiconductor materials after deposition. It has also been successfully applied 
to: synthetic polymers, liquid crystals, biopoloymers ^ and biomembranes^ . 



See also 

• X-ray scattering 

• Amorphous solid 

• Single Crystal 

• Polycrystalline 

• Crystallography 

• ->DNA 

• X-ray pattern of a B-DNA Paracrystal 
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Vibrational circular dichroism (VCD) spectroscopy is basically circular dichroism 
spectroscopy in the infrared and near infrared ranges^ . Because VCD is sensitive to the 
mutual orientation of distinct groups in a molecule, it provides three-dimensional structural 
information. Thus, it is a powerful technique as VCD spectra of enantiomers can be 
simulated using ab initio calculations, thereby allowing the identification of absolute 
configurations of small molecules in solution from VCD spectra. Among such quantum 
computations of VCD spectra resulting from the chiral properties of small organic 
molecules are those based on density functional theory (DFT) and gauge-invariant atomic 
orbitals (GIAO). As a simple example of the experimental results that were obtained by VCD 
are the spectral data obtained within the carbon-hydrogen (C-H) stretching region of 21 
amino acids in heavy water solutions. Measurements of vibrational optical activity (VOA) 
have thus numerous applications, not only for small molecules, but also for large and 
complex biopolymers such as muscle proteins (myosin, for example) and -» DNA. 
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Hydrogene 




Agitation moleculaire en milieu aqueux 



VCD of peptides and proteins 

Extensive VCD studies have been reported for both polypeptides and several proteins in 
solution^ ^ ^ ; several recent reviews were also compiled^ ^ ^ ^ . An extensive but 
not comprehensive VCD publications list is also provided in the "References" section. The 
published reports over the last 22 years have established VCD as a powerful technique with 
improved results over those previously obtained by visible/UV circular dichroism (CD) or 
optical rotatory dispersion (ORD) for proteins and nucleic acids. 



Amino acid and polypeptide structures 




Amino acid sequence 
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VCD of nucleic acids 

VCD spectra of nucleotides, synthetic polynucleotides and several nucleic acids, including 
DNA, have been reported and assigned in terms of the type and number of helices present 
in A- , B-, and Z- DNA. 

VCD Instrumentation 

For biopolymers such as proteins and nucleic acids, the difference in absorbance between 

the levo- and dextro- configurations is five orders of magnitude smaller than the 

corresonding (unpolarized) absorbance. Therefore, VCD of biopolymers requires the use of 

very sensitive, specially built instrumentation as well as time-averaging over relatively long 

intervals of time even with such sensitive VCD spectrometers. Most CD instruments 

produce left- and right- circularly polarized light which is then either sine-wave or 

square-wave modulated, with subsequent phase-sensitive detection and lock-in 

amplification of the detected signal. In the case of FT-VCD, a photo-elastic modulator (PEM) 

is employed in conjunction with an FT-IR interferometer set-up. An example is that of a 

Bomem model MB- 100 FT-IR interferometer equipped with additional polarizing optics/ 

accessories needed for recording VCD spectra. A parallel beam emerges through a side 

port of the interferometer which passes first through a wire grid linear polarizer and then 

through an octagonal-shaped ZnSe crystal PEM which modulates the polarized beam at a 

fixed, lower frequency such as 37.5 kHz. A mechanically stressed crystal such as ZnSe 

exhibits birefringence when stressed by an adjacent piezoelectric transducer. The linear 

polarizer is positioned close to, and at 45 degrees, with respect to the ZnSe crystal axis. 

The polarized radiation focused onto the detector is doubly modulated, both by the PEM 

and by the interferometer setup. A very low noise detector, such as MCT (HgCdTe), is also 

selected for the VCD signal phase-sensitive detection. Quasi-complete commercial FT-VCD 

instruments are also available from a few manufacturers but these are quite expensive and 

also have to be still considered as being at the prototype stage. To prevent detector 

saturation an appropriate, long wave pass filter is placed before the very low noise MCT 

i 

detector, which allows only radiation below 1750 cm" to reach the MCT detector; the latter 
however measures radiation only down to 750 cm" 1 . FT-VCD spectra accumulation of the 
selected sample solution is then carried out, digitized and stored by an in-line computer. 
Published reviews that compare various VCD methods are also available ^ 10 ^ 




Vibrational circular dichroism 



35 




Magnetic VCD 

VCD spectra have also been reported in the presence of an applied external magnetic 
field [11] . This method can enhance the VCD spectral resolution for small molecules 1 J L J 

[14] [15] [16] 



Raman optical activity (ROA) 

ROA is a technique complementary to VCD especially useful in the 50--1600 cm" 1 spectral 
region; it is considered as the technique of choice for determining optical activity for 
photon energies less then 600 cm" 1 . 
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Circular Dichroism of Poly-gamma-Benzyl-L-Glutamate," P. Malon, R. Kobrinskaya, T. A. Keiderling, 
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682-688 (2002). 
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DNA Molecular Dynamics 



Molecular models of DNA 



Molecular models of DNA structures are representations of the molecular geometry and 
topology of Deoxyribonucleic acid (-> DNA) molecules using one of several means, such as: 
closely packed spheres (CPK models) made of plastic, metal wires for 'skeletal models', 
graphic computations and animations by computers, artistic rendering, and so on, with the 
aim of simplifying and presenting the essential, physical and chemical, properties of DNA 
molecular structures either in vivo or in vitro. Computer molecular models also allow 
animations and molecular dynamics simulations that are very important for understanding 
how DNA functions in vivo. Thus, an old standing dynamic problem is how DNA 
"self-replication" takes place in living cells that should involve transient uncoiling of 
supercoiled DNA fibers. Although DNA consists of relatively rigid, very large elongated 
biopolymer molecules called "fibers" or chains (that are made of repeating nucleotide units 
of four basic types, attached to deoxyribose and phosphate groups), its molecular structure 
in vivo undergoes dynamic configuration changes that involve dynamically attached water 
molecules and ions. Supercoiling, packing with histones in chromosome structures, and 
other such supramolecular aspects also involve in vivo DNA topology which is even more 
complex than DNA molecular geometry, thus turning molecular modeling of DNA into an 
especially challenging problem for both molecular biologists and biotechnologists. Like 
other large molecules and biopolymers, DNA often exists in multiple stable geometries (that 
is, it exhibits conformational isomerism) and configurational, quantum states which are 
close to each other in energy on the potential energy surface of the DNA molecule. Such 
geometries can also be computed, at least in principle, by employing ab initio quantum 
chemistry methods that have high accuracy for small molecules. Such quantum geometries 
define an important class of ab initio molecular models of DNA whose exploration has 
barely started. 

In an interesting twist of roles, the DNA molecule itself was proposed to [7^ 
be utilized for quantum computing. Both DNA nanostructures as well as i^^SteL 



The more advanced, computer-based molecular models of DNA involve -> 
molecular dynamics simulations as well as quantum mechanical 
computations of vibro-rotations, delocalized molecular orbitals (MOs), 
electric dipole moments, hydrogen-bonding, and so on. 



DNA 'computing' biochips have been built (see biochip image at right). 




DNA computing 
biochip: 3D 
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Importance 

From the very early stages of structural studies of DNA by X-ray 
diffraction and biochemical means, molecular models such as the 
Watson-Crick double-helix model were successfully employed to solve the 
'puzzle' of DNA structure, and also find how the latter relates to its key 
functions in living cells. The first high quality X-ray diffraction patterns 
of A-DNA were reported by Rosalind Franklin and Raymond Gosling in 
1953^ . The first calculations of the Fourier transform of an atomic helix 
were reported one year earlier by Cochran, Crick and Vand ^ , and were 
followed in 1953 by the computation of the Fourier transform of a 
coiled-coil by Crick^ . The first reports of a double-helix molecular 
model of B-DNA structure were made by Watson and Crick in 1953^ ^ . 
Last-but-not-least, Maurice F. Wilkins, A. Stokes and H.R. Wilson, 
reported the first X-ray patterns of in vivo B-DNA in partially oriented 
salmon sperm heads ^ . The development of the first correct 
double-helix molecular model of DNA by Crick and Watson may not have 
been possible without the biochemical evidence for the nucleotide base-pairing ([A— T]; 
[C-G]), or Chargaff s rules [7] [8] [9] [10] [12] . 



Spinning DNA 
generic model. 



Examples of DNA molecular models 

Animated molecular models allow one to visually explore the three-dimensional (3D) 
structure of DNA. The first DNA model is a space-filling, or CPK, model of the DNA 
double-helix whereas the third is an animated wire, or skeletal type, molecular model of 
DNA. The last two DNA molecular models in this series depict quadruplex DNA ^ 13 ^ that 
may be involved in certain cancers^ 14 ^ ^ 15 ^ . The last figure on this panel is a molecular 
model of hydrogen bonds between water molecules in ice that are similar to those found in 
DNA. 
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A-DNA B-DNA 
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Hydrogen 




• Spacefilling model or CPK model - a molecule is represented by overlapping spheres 
representing the atoms. 




Images for DNA Structure Determination from X-Ray 
Patterns 

The following images illustrate both the principles and the main steps involved in 
generating structural information from X-ray diffraction studies of oriented DNA fibers with 
the help of molecular models of DNA that are combined with crystallographic and 
mathematical analysis of the X-ray patterns. From left to right the gallery of images shows: 

• First row: 

• 1. Constructive X-ray interference, or diffraction, following Bragg's Law of X-ray 
"reflection by the crystal planes"; 

• 2. A comparison of A-DNA (crystalline) and highly hydrated B-DNA (paracrystalline) X-ray 
diffraction, and respectively, X-ray scattering patterns (courtesy of Dr. Herbert R. Wilson, 
FRS- see refs. list); 

• 3. Purified DNA precipitated in a water jug; 

• 4. The major steps involved in DNA structure determination by X-ray crystallography 
showing the important role played by molecular models of DNA structure in this iterative, 
structure-determination process; 

• Second row: 

• 5. Photo of a modern X-ray diffractometer employed for recording X-ray patterns of DNA 
with major components: X-ray source, goniometer, sample holder, X-ray detector and/or 
plate holder; 

• 6. Illustrated animation of an X-ray goniometer; 

• 7. X-ray detector at the SLAC synchrotron facility; 

• 8. Neutron scattering facility at ISIS in UK; 

• Third and fourth rows: Molecular models of DNA structure at various scales; figure 
#11 is an actual electron micrograph of a DNA fiber bundle, presumably of a single 
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Paracrystalline lattice models of B-DNA structures 

A -> paracrystalline lattice, or paracrystal, is a molecular or atomic lattice with significant 
amounts (e.g., larger than a few percent) of partial disordering of molecular 
arranegements. Limiting cases of the paracrystal model are nanostructures, such as 
glasses, liquids, etc., that may possess only local ordering and no global order. Liquid 
crystals also have paracrystalline rather than crystalline structures. 
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DNA Helix controversy in 1952 



Molecular models of DNA 



50 



Highly hydrated B-DNA occurs naturally in living cells in such a paracrystalline state, which 
is a dynamic one in spite of the relatively rigid DNA double-helix stabilized by parallel 
hydrogen bonds between the nucleotide base-pairs in the two complementary, helical DNA 
chains (see figures). For simplicity most DNA molecular models ommit both water and ions 
dynamically bound to B-DNA, and are thus less useful for understanding the dynamic 
behaviors of B-DNA in vivo. The physical and mathematical analysis of X-ray^ 16 ^ ^ 17 ^ and 
spectroscopic data for paracrystalline B-DNA is therefore much more complicated than that 
of crystalline, A-DNA X-ray diffraction patterns. The paracrystal model is also important for 
DNA technological applications such as DNA nanotechnology. Novel techniques that 
combine X-ray diffraction of DNA with X-ray microscopy in hydrated living cells are now 
also being developed (see, for example, "Application of X-ray microscopy in the analysis of 
living hydrated cells" ^ 18 b. 

Genomic and Biotechnology Applications of DNA molecular 
modeling 

The following gallery of images illustrates various uses of DNA molecular modeling in 
Genomics and Biotechnology research applications from DNA repair to PCR and DNA 
nanostructures; each slide contains its own explanation and/or details. The first slide 
presents an overview of DNA applications, including DNA molecular models, with emphasis 
on Genomics and Biotechnology. 

Gallery: DNA Molecular modeling applications 
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Databases for DNA molecular models and sequences 



X-ray diffraction 

• NDB ID: UD0017 Database [19] 

• X-ray Atlas -database ' 20 ^ 

• PDB files of coordinates for nucleic acid structures from X-ray diffraction by NA (incl. 
DNA) crystals [21] 



• Structure factors dowloadable files in CIF format 



[22] 
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Neutron scattering 

• ISIS neutron source 

• ISIS pulsed neutron source:A world centre for science with neutrons & muons at 
Harwell, near Oxford, UK. [23] 

X-ray microscopy 

• Application of X-ray microscopy in the analysis of living hydrated cells ^ 24 ^ 

Electron microscopy 

• DNA under electron microscope ^ 25 ^ 

Atomic Force Microscopy (AFM) 

Two-dimensional DNA junction arrays have been visualized by Atomic Force Microscopy 
(AFM)^ 26 ^ . Other imaging resources for AFM/Scanning probe microscopy(SPM) can be 
freely accessed at: 

• How SPM Works [27] 

• SPM Image Gallery - AFM STM SEM MFM NSOM and more. [28] 
Gallery of AFM Images 
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Mass spectrometry— Maldi informatics 



Data acquisition 




Genotype, 
mutations, etc. 



udeic acid 
quantitation 



Spectroscopy 

• -» Vibrational circular dichroism (VCD) 

• FT-NMR [29] [30] 

• NMR Atlas-database [31] 

• mmcif downloadable coordinate files of nucleic acids in solution from 2D-FT NMR data 

[32] 

• NMR constraints files for NAs in PDB format [33] 
NMR microscopy [34] 
Microwave spectroscopy 
FT-IR 

FT-NIR [35] [36] [37] 

Spectral Hyperspectral, and -> Chemical imaging) [38] [39] [40] [41] [42] [43] [44] . 
Raman spectroscopy/microscopy^ 45 ^ and CARS^ 46 ^ . 

Fluorescence correlation spectroscopy [47] [48] [49] [50] [51] [52] [53] [54] , Fluorescence 
cross-correlation spectroscopy and FRET^ 

• Confocal microscopy^ 58 ^ 



n [55] [56] [57] 
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Gallery: CARS (Raman spectroscopy), Fluorescence confocal 
microscopy, and Hyperspectral imaging 



Multispectral/ 
Hyperspectral Comparison 
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Genomic and structural databases 

• CBS Genome Atlas Database ^ 59 ^ — contains examples of base skews J 60 ^ 

• The Z curve database of genomes — a 3-dimensional visualization and analysis tool of 
genomes [61][62] . 

• DNA and other nucleic acids' molecular models: Coordinate files of nucleic acids 
molecular structure models in PDB and CIF formats ^ 63 ^ 
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External links 

• DNA the Double Helix Game (http://nobelprize.org/educational_games/medicine/ 
dna_double_helix/) From the official Nobel Prize web site 

• MDDNA: Structural Bioinformatics of DNA (http://humphry.chem. wesleyan.edu:8080/ 
MDDNA/) 

• Double Helix 1953-2003 (http://www.ncbe.reading.ac.uk/DNA50/) National Centre 
for Biotechnology Education 

• DNA under electron microscope (http://www.fidelitysystems.com/Unlinked_DNA.html) 

• Ascalaph DNA (http://www.agilemolecule.com/Ascalaph/Ascalaph_DNA.html) — 
Commercial software for DNA modeling 

• DNAlive: a web interface to compute DNA physical properties (http://mmb.pcb.ub.es/ 
DNAIive). Also allows cross-linking of the results with the UCSC Genome browser and 
DNA dynamics. 

• DiProDB: Dinucleotide Property Database (http://diprodb.fli-leibniz.de). The database is 
designed to collect and analyse thermodynamic, structural and other dinucleotide 
properties. 

• Further details of mathematical and molecular analysis of DNA structure based on X-ray 
data (http://planetphysics.org/encyclopedia/ 
BesselFunctionsApplicationsToDiffractionByHelicalStructures.html) 

• Bessel functions corresponding to Fourier transforms of atomic or molecular helices. 
(http://planetphysics.org/?op=getobj&from=objects& 

name=BesselFunctionsAndTheirApplicationsToDiffractionByHelicalStructures) 

• Application of X-ray microscopy in analysis of living hydrated cells (http://www.ncbi. 
nlm. nih.gov/entrez/query. fcgi?cmd = Retrieve&db=pubmed&dopt=Abstract& 
list_uids=12379938) 

• Characterization in nanotechnology some pdfs (http://nanocharacterization.sitesled. 
com/) 

• overview of STM/AFM/SNOM principles with educative videos (http://www.ntmdt.ru/ 
SPM-Techniques/Principles/) 

• SPM Image Gallery - AFM STM SEM MFM NSOM and More (http://www.rhk-tech.com/ 
results/showcase, php) 

• How SPM Works (http://www.parkafm.com/New_html/resources/01general.php) 

• U.S. National DNA Day (http://www.genome.gov/10506367) — watch videos and 
participate in real-time discusssions with scientists. 

• The Secret Life of DNA - DNA Music compositions (http://www.tjmitchell.com/stuart/ 
dna.html) 
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Molecular dynamics 

Molecular dynamics (MD) is a form of computer simulation in which atoms and molecules 
are allowed to interact for a period of time by approximations of known physics, giving a 
view of the motion of the atoms. Because molecular systems generally consist of a vast 
number of particles, it is impossible to find the properties of such complex systems 
analytically. When the number of bodies are more than two no analytical solutions can be 
found and result in chaotic motion (see n-body problem). MD simulation circumvents this 
problem by using numerical methods. It represents an interface between laboratory 
experiments and theory, and can be understood as a "virtual experiment". MD probes the 
relationship between molecular structure, movement and function. Molecular dynamics is a 
multidisciplinary method. Its laws and theories stem from mathematics, physics, and 
chemistry, and it employs algorithms from computer science and information theory. It was 
originally conceived within theoretical physics in the late 1950s^ and early 1960s ^ , but 
is applied today mostly in materials science and modeling of biomolecules. 

Before it became possible to simulate molecular dynamics with computers, some undertook 
the hard work of trying it with physical models such as macroscopic spheres. The idea was 
to arrange them to replicate the properties of a liquid. J.D. Bernal said, in 1962: "... I took a 
number of rubber balls and stuck them together with rods of a selection of different lengths 
ranging from 2.75 to 4 inches. I tried to do this in the first place as casually as possible, 
working in my own office, being interrupted every five minutes or so and not remembering 
what I had done before the interruption."^ Fortunately, now computers keep track of 
bonds during a simulation. 

Molecular dynamics is a specialized discipline of molecular modeling and computer 
simulation based on statistical mechanics; the main justification of the MD method is that 
statistical ensemble averages are equal to time averages of the system, known as the 
ergodic hypothesis. MD has also been termed "statistical mechanics by numbers" and 
"Laplace's vision of Newtonian mechanics" of predicting the future by animating nature's 
forces^ ^ and allowing insight into molecular motion on an atomic scale. However, long 
MD simulations are mathematically ill-conditioned, generating cumulative errors in 
numerical integration that can be minimized with proper selection of algorithms and 
parameters, but not eliminated entirely. Furthermore, current potential functions are, in 
many cases, not sufficiently accurate to reproduce the dynamics of molecular systems, so 
the much more computationally demanding Ab Initio Molecular Dynamics method must be 
used. Nevertheless, molecular dynamics techniques allow detailed time and space 
resolution into representative behavior in phase space. 
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Give atoms initial positions r< t=0 >, choose short At 



Get forces F = - V V(r®) and a = F/m 
\ Z 



Move atoms: i*^ = f® +v«) At + 1 / 2 aAt 2 +.. 
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Move time forward: t = t + At 



I 



Repeat as long as you need 



Highly simplified description of the molecular dynamics simulation 

algorithm. The simulation proceeds iteratively by alternatively 
calculating forces and solving the equations of motion based on the 
accelerations obtained from the new forces. In practise, almost all 
MD codes use much more complicated versions of the algorithm, 
including two steps (predictor and corrector) in solving the equations 
of motion and many additional steps for e.g. temperature and 
pressure control, analysis and output. 



Areas of Application 

There is a significant difference 
between the focus and methods 
used by chemists and 
physicists, and this is reflected 
in differences in the jargon 
used by the different fields. In 
chemistry and biophysics, the 
interaction between the 
particles is either described by 
a "force field" (classical MD), 
a quantum chemical model, or 
a mix between the two. These 
terms are not used in physics, 
where the interactions are 
usually described by the name 
of the theory or approximation 
being used and called the 
potential energy, or just "potential". 

Beginning in theoretical physics, the method of MD gained popularity in materials science 
and since the 1970s also in biochemistry and biophysics. In chemistry, MD serves as an 
important tool in protein structure determination and refinement using experimental tools 
such as X-ray crystallography and NMR. It has also been applied with limited success as a 
method of refining protein structure predictions. In physics, MD is used to examine the 
dynamics of atomic-level phenomena that cannot be observed directly, such as thin film 
growth and ion-subplantation. It is also used to examine the physical properties of 
nanotechnological devices that have not or cannot yet be created. 

In applied mathematics and theoretical physics, molecular dynamics is a part of the 
research realm of dynamical systems, ergodic theory and statistical mechanics in general. 
The concepts of energy conservation and molecular entropy come from thermodynamics. 
Some techniques to calculate conformational entropy such as principal components analysis 
come from information theory. Mathematical techniques such as the transfer operator 
become applicable when MD is seen as a Markov chain. Also, there is a large community of 
mathematicians working on volume preserving, symplectic integrators for more 
computationally efficient MD simulations. 

MD can also be seen as a special case of the discrete element method (DEM) in which the 
particles have spherical shape (e.g. with the size of their van der Waals radii.) Some 
authors in the DEM community employ the term MD rather loosely, even when their 
simulations do not model actual molecules. 
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Design Constraints 

Design of a molecular dynamics simulation should account for the available computational 
power. Simulation size (n=number of particles), timestep and total time duration must be 
selected so that the calculation can finish within a reasonable time period. However, the 
simulations should be long enough to be relevant to the time scales of the natural processes 
being studied. To make statistically valid conclusions from the simulations, the time span 
simulated should match the kinetics of the natural process. Otherwise, it is analogous to 
making conclusions about how a human walks from less than one footstep. Most scientific 
publications about the dynamics of proteins and DNA use data from simulations spanning 
nanoseconds (1E-9 s) to microseconds (1E-6 s). To obtain these simulations, several 
CPU-days to CPU-years are needed. Parallel algorithms allow the load to be distributed 
among CPUs; an example is the spatial decomposition in LAMMPS. 

During a classical MD simulation, the most CPU intensive task is the evaluation of the 
potential (force field) as a function of the particles' internal coordinates. Within that energy 
evaluation, the most expensive one is the non-bonded or non-covalent part. In Big O 
notation, common molecular dynamics simulations scale by 0(n 2 ) if all pair- wise 
electrostatic and van der Waals interactions must be accounted for explicitly. This 
computational cost can be reduced by employing electrostatics methods such as Particle 
Mesh Ewald ( 0(nlog(n)) ) or good spherical cutoff techniques ( 0(n) ). 

Another factor that impacts total CPU time required by a simulation is the size of the 
integration timestep. This is the time length between evaluations of the potential. The 
timestep must be chosen small enough to avoid discretization errors (i.e. smaller than the 
fastest vibrational frequency in the system). Typical timesteps for classical MD are in the 
order of 1 femtosecond (1E-15 s). This value may be extended by using algorithms such as 
SHAKE, which fix the vibrations of the fastest atoms (e.g. hydrogens) into place. Multiple 
time scale methods have also been developed, which allow for extended times between 
updates of slower long-range forces.^ ^ ^ 

For simulating molecules in a solvent, a choice should be made between explicit solvent and 
implicit solvent. Explicit solvent particles (such as the TIP3P and SPC/E water models) must 
be calculated expensively by the force field, while implicit solvents use a mean-field 
approach. Using an explicit solvent is computationally expensive, requiring inclusion of 
about ten times more particles in the simulation. But the granularity and viscosity of 
explicit solvent is essential to reproduce certain properties of the solute molecules. This is 
especially important to reproduce kinetics. 

In all kinds of molecular dynamics simulations, the simulation box size must be large 
enough to avoid boundary condition artifacts. Boundary conditions are often treated by 
choosing fixed values at the edges, or by employing periodic boundary conditions in which 
one side of the simulation loops back to the opposite side, mimicking a bulk phase. 

Microcanonical ensemble (NVE) 

In the microcanonical, or NVE ensemble, the system is isolated from changes in moles 
(N), volume (V) and energy (E). It corresponds to an adiabatic process with no heat 
exchange. A microcanonical molecular dynamics trajectory may be seen as an exchange of 
potential and kinetic energy, with total energy being conserved. For a system of N particles 
with coordinates X and velocities V, the following pair of first order differential equations 
may be written in Newton's notation as 
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F(X) = -W(X) = MV(t) 
V(t) = X(t). 

The potential energy function U(X ) of the system is a function of the particle coordinates 
X . It is referred to simply as the "potential" in Physics, or the "force field" in Chemistry. 
The first equation comes from Newton's laws; the force F acting on each particle in the 
system can be calculated as the negative gradient of U{X) . 

For every timestep, each particle's position X and velocity V may be integrated with a 
symplectic method such as Verlet. The time evolution of X and V is called a trajectory. 
Given the initial positions (e.g. from theoretical knowledge) and velocities (e.g. randomized 
Gaussian), we can calculate all future (or past) positions and velocities. 

One frequent source of confusion is the meaning of temperature in MD. Commonly we have 
experience with macroscopic temperatures, which involve a huge number of particles. But 
temperature is a statistical quantity. If there is a large enough number of atoms, statistical 
temperature can be estimated from the instantaneous temperature, which is found by 
equating the kinetic energy of the system to nk B T/2 where n is the number of degrees of 
freedom of the system. 

A temperature-related phenomenon arises due to the small number of atoms that are used 
in MD simulations. For example, consider simulating the growth of a copper film starting 
with a substrate containing 500 atoms and a deposition energy of 100 eV. In the real world, 
the 100 eV from the deposited atom would rapidly be transported through and shared 
among a large number of atoms ( 10 10 or more) with no big change in temperature. When 
there are only 500 atoms, however, the substrate is almost immediately vaporized by the 
deposition. Something similar happens in biophysical simulations. The temperature of the 
system in NVE is naturally raised when macromolecules such as proteins undergo 
exothermic conformational changes and binding. 

Canonical ensemble (NVT) 

In the canonical ensemble, moles (N), volume (V) and temperature (T) are conserved. It is 
also sometimes called constant temperature molecular dynamics (CTMD). In NVT, the 
energy of endothermic and exothermic processes is exchanged with a thermostat. 

A variety of thermostat methods are available to add and remove energy from the 
boundaries of an MD system in a realistic way, approximating the canonical ensemble. 
Popular techniques to control temperature include the Nose-Hoover thermostat, the 
Berendsen thermostat, and Langevin dynamics. Note that the Berendsen thermostat might 
introduce the flying ice cube effect, which leads to unphysical translations and rotations of 
the simulated system. 
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Isothermal- Isobaric (NPT) ensemble 

In the isothermal-isobaric ensemble, moles (N), pressure (P) and temperature (T) are 
conserved. In addition to a thermostat, a barostat is needed. It corresponds most closely to 
laboratory conditions with a flask open to ambient temperature and pressure. 

In the simulation of biological membranes, isotropic pressure control is not appropriate. 
For lipid bilayers, pressure control occurs under constant membrane area (NPAT) or 
constant surface tension "gamma" (NPyT). 

Generalized ensembles 

The replica exchange method is a generalized ensemble. It was originally created to deal 
with the slow dynamics of disordered spin systems. It is also called parallel tempering. The 
replica exchange MD (REMD) formulation ^ tries to overcome the multiple-minima 
problem by exchanging the temperature of non-interacting replicas of the system running 
at several temperatures. 

Potentials in MD simulations 

A molecular dynamics simulation requires the definition of a potential function, or a 
description of the terms by which the particles in the simulation will interact. In chemistry 
and biology this is usually referred to as a force field. Potentials may be defined at many 
levels of physical accuracy; those most commonly used in chemistry are based on molecular 
mechanics and embody a classical treatment of particle-particle interactions that can 
reproduce structural and conformational changes but usually cannot reproduce chemical 
reactions. 

The reduction from a fully quantum description to a classical potential entails two main 
approximations. The first one is the Born-Oppenheimer approximation, which states that 
the dynamics of electrons is so fast that they can be considered to react instantaneously to 
the motion of their nuclei. As a consequence, they may be treated separately. The second 
one treats the nuclei, which are much heavier than electrons, as point particles that follow 
classical Newtonian dynamics. In classical molecular dynamics the effect of the electrons is 
approximated as a single potential energy surface, usually representing the ground state. 

When finer levels of detail are required, potentials based on quantum mechanics are used; 
some techniques attempt to create hybrid classical/quantum potentials where the bulk of 
the system is treated classically but a small region is treated as a quantum system, usually 
undergoing a chemical transformation. 

Empirical potentials 

Empirical potentials used in chemistry are frequently called force fields, while those used in 
materials physics are called just empirical or analytical potentials. 

Most force fields in chemistry are empirical and consist of a summation of bonded forces 
associated with chemical bonds, bond angles, and bond dihedrals, and non-bonded forces 
associated with van der Waals forces and electrostatic charge. Empirical potentials 
represent quantum-mechanical effects in a limited way through ad-hoc functional 
approximations. These potentials contain free parameters such as atomic charge, van der 
Waals parameters reflecting estimates of atomic radius, and equilibrium bond length, 
angle, and dihedral; these are obtained by fitting against detailed electronic calculations 
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(quantum chemical simulations) or experimental physical properties such as elastic 
constants, lattice parameters and spectroscopic measurements. 

Because of the non-local nature of non-bonded interactions, they involve at least weak 
interactions between all particles in the system. Its calculation is normally the bottleneck in 
the speed of MD simulations. To lower the computational cost, force fields employ 
numerical approximations such as shifted cutoff radii, reaction field algorithms, particle 
mesh Ewald summation, or the newer Particle-Particle Particle Mesh (P3M). 

Chemistry force fields commonly employ preset bonding arrangements (an exception being 
ab-initio dynamics), and thus are unable to model the process of chemical bond breaking 
and reactions explicitly. On the other hand, many of the potentials used in physics, such as 
those based on the bond order formalism can describe several different coordinations of a 
system and bond breaking. Examples of such potentials include the Brenner potential' 10 ^ for 
hydrocarbons and its further developments for the C-Si-H and C-O-H systems. The ReaxFF 
potential ^ can be considered a fully reactive hybrid between bond order potentials and 
chemistry force fields. 

Pair potentials vs. many- body potentials 

The potential functions representing the non-bonded energy are formulated as a sum over 
interactions between the particles of the system. The simplest choice, employed in many 
popular force fields, is the "pair potential", in which the total potential energy can be 
calculated from the sum of energy contributions between pairs of atoms. An example of 
such a pair potential is the non-bonded Lennard-Jones potential (also known as the 6-12 
potential), used for calculating van der Waals forces. 

Another example is the Born (ionic) model of the ionic lattice. The first term in the next 
equation is Coulomb's law for a pair of ions, the second term is the short-range repulsion 
explained by Pauli's exclusion principle and the final term is the dispersion interaction 
term. Usually, a simulation only includes the dipolar term, although sometimes the 
quadrupolar term is included as well. 

-E^+EW*> + E <W + ■ • ■ 

In many-body potentials, the potential energy includes the effects of three or more particles 
interacting with each other. In simulations with pairwise potentials, global interactions in 
the system also exist, but they occur only through pairwise terms. In many-body potentials, 
the potential energy cannot be found by a sum over pairs of atoms, as these interactions are 
calculated explicitly as a combination of higher-order terms. In the statistical view, the 
dependency between the variables cannot in general be expressed using only pairwise 
products of the degrees of freedom. For example, the Tersoff potential' 12 ^ , which was 
originally used to simulate carbon, silicon and germanium and has since been used for a 
wide range of other materials, involves a sum over groups of three atoms, with the angles 
between the atoms being an important factor in the potential. Other examples are the 
embedded-atom method (EAM)' 13 ^ and the Tight-Binding Second Moment Approximation 
(TBSMA) potentials' 14 ^ , where the electron density of states in the region of an atom is 
calculated from a sum of contributions from surrounding atoms, and the potential energy 
contribution is then a function of this sum. 
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Semi-empirical potentials 

Semi-empirical potentials make use of the matrix representation from quantum mechanics. 
However, the values of the matrix elements are found through empirical formulae that 
estimate the degree of overlap of specific atomic orbitals. The matrix is then diagonalized to 
determine the occupancy of the different atomic orbitals, and empirical formulae are used 
once again to determine the energy contributions of the orbitals. 

There are a wide variety of semi-empirical potentials, known as tight-binding potentials, 
which vary according to the atoms being modeled. 

Polarizable potentials 

Most classical force fields implicitly include the effect of polarizability, e.g. by scaling up 
the partial charges obtained from quantum chemical calculations. These partial charges are 
stationary with respect to the mass of the atom. But molecular dynamics simulations can 
explicitly model polarizability with the introduction of induced dipoles through different 
methods, such as Drude particles or fluctuating charges. This allows for a dynamic 
redistribution of charge between atoms which responds to the local chemical environment. 

For many years, polarizable MD simulations have been touted as the next generation. For 
homogenous liquids such as water, increased accuracy has been achieved through the 
inclusion of polarizability.^ 15 ^ Some promising results have also been achieved for 
proteinsJ 16 ^ However, it is still uncertain how to best approximate polarizability in a 
simulation. 

Ab-initio methods 

In classical molecular dynamics, a single potential energy surface (usually the ground state) 
is represented in the force field. This is a consequence of the Born-Oppenheimer 
approximation. If excited states, chemical reactions or a more accurate representation is 
needed, electronic behavior can be obtained from first principles by using a quantum 
mechanical method, such as Density Functional Theory. This is known as Ab Initio 
Molecular Dynamics (AIMD). Due to the cost of treating the electronic degrees of freedom, 
the computational cost of this simulations is much higher than classical molecular 
dynamics. This implies that AIMD is limited to smaller systems and shorter periods of time. 

Ab-initio quantum-mechanical methods may be used to calculate the potential energy of a 
system on the fly, as needed for conformations in a trajectory. This calculation is usually 
made in the close neighborhood of the reaction coordinate. Although various 
approximations may be used, these are based on theoretical considerations, not on 
empirical fitting. Ab-initio calculations produce a vast amount of information that is not 
available from empirical methods, such as density of electronic states or other electronic 
properties. A significant advantage of using ab-initio methods is the ability to study 
reactions that involve breaking or formation of covalent bonds, which correspond to 
multiple electronic states. 

A popular software for ab-initio molecular dynamics is the Car-Parrinello Molecular 
Dynamics (CPMD) package based on the density functional theory. 
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Hybrid QM/MM 

QM (quantum-mechanical) methods are very powerful. However, they are computationally 
expensive, while the MM (classical or molecular mechanics) methods are fast but suffer 
from several limitations (require extensive parameterization; energy estimates obtained are 
not very accurate; cannot be used to simulate reactions where covalent bonds are 
broken/formed; and are limited in their abilities for providing accurate details regarding the 
chemical environment). A new class of method has emerged that combines the good points 
of QM (accuracy) and MM (speed) calculations. These methods are known as mixed or 
hybrid quantum-mechanical and molecular mechanics methods (hybrid QM/MM). The 
methodology for such techniques was introduced by Warshel and coworkers. In the recent 
years have been pioneered by several groups including: Arieh Warshel (University of 
Southern California), Weitao Yang (Duke University), Sharon Hammes-Schiffer (The 
Pennsylvania State University), Donald Truhlar and Jiali Gao (University of Minnesota) and 
Kenneth Merz (University of Florida). 

The most important advantage of hybrid QM/MM methods is the speed. The cost of doing 
classical molecular dynamics (MM) in the most straightforward case scales 0(n ), where N 
is the number of atoms in the system. This is mainly due to electrostatic interactions term 
(every particle interacts with every other particle). However, use of cutoff radius, periodic 
pair-list updates and more recently the variations of the particle-mesh Ewald's (PME) 
method has reduced this between O(N) to 0(n 2 ). In other words, if a system with twice 
many atoms is simulated then it would take between twice to four times as much computing 
power. On the other hand the simplest ab-initio calculations typically scale 0(n ) or worse 
(Restricted Hartree-Fock calculations have been suggested to scale ~0(n 2,7 )). To overcome 
the limitation, a small part of the system is treated quantum-mechanically (typically 
active-site of an enzyme) and the remaining system is treated classically. 

In more sophisticated implementations, QM/MM methods exist to treat both light nuclei 
susceptible to quantum effects (such as hydrogens) and electronic states. This allows 
generation of hydrogen wave-functions (similar to electronic wave-functions). This 
methodology has been useful in investigating phenomenon such as hydrogen tunneling. One 
example where QM/MM methods have provided new discoveries is the calculation of 
hydride transfer in the enzyme liver alcohol dehydrogenase. In this case, tunneling is 
important for the hydrogen, as it determines the reaction rate J 17 ^ 

Coarse-graining and reduced representations 

At the other end of the detail scale are coarse-grained and lattice models. Instead of 
explicitly representing every atom of the system, one uses "pseudo-atoms" to represent 
groups of atoms. MD simulations on very large systems may require such large computer 
resources that they cannot easily be studied by traditional all-atom methods. Similarly, 
simulations of processes on long timescales (beyond about 1 microsecond) are prohibitively 
expensive, because they require so many timesteps. In these cases, one can sometimes 
tackle the problem by using reduced representations, which are also called coarse-grained 
models. 

Examples for coarse graining (CG) methods are discontinuous molecular dynamics 
(CG-DMD)^ 18 ^ ^ 19 ^ and Go-models^ 20 ^ . Coarse-graining is done sometimes taking larger 
pseudo-atoms. Such united atom approximations have been used in MD simulations of 
biological membranes. The aliphatic tails of lipids are represented by a few pseudo-atoms 
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by gathering 2-4 methylene groups into each pseudo-atom. 

The parameterization of these very coarse-grained models must be done empirically, by 
matching the behavior of the model to appropriate experimental data or all-atom 
simulations. Ideally, these parameters should account for both enthalpic and entropic 
contributions to free energy in an implicit way. When coarse-graining is done at higher 
levels, the accuracy of the dynamic description may be less reliable. But very 
coarse-grained models have been used successfully to examine a wide range of questions in 
structural biology. 

Examples of applications of coarse-graining in biophysics: 

• protein folding studies are often carried out using a single (or a few) pseudo-atoms per 
amino acid; 

• DNA supercoiling has been investigated using 1-3 pseudo-atoms per basepair, and at 
even lower resolution; 

• Packaging of -» double-helical DNA into bacteriophage has been investigated with models 
where one pseudo-atom represents one turn (about 10 basepairs) of the double helix; 

• RNA structure in the ribosome and other large systems has been modeled with one 
pseudo-atom per nucleotide. 

The simplest form of coarse-graining is the "united atom" (sometimes called "extended 
atom") and was used in most early MD simulations of proteins, lipids and nucleic acids. For 
example, instead of treating all four atoms of a CH 3 methyl group explicitly (or all three 
atoms of CH 2 methylene group), one represents the whole group with a single pseudo-atom. 
This pseudo-atom must, of course, be properly parameterized so that its van der Waals 
interactions with other groups have the proper distance-dependence. Similar 
considerations apply to the bonds, angles, and torsions in which the pseudo-atom 
participates. In this kind of united atom representation, one typically eliminates all explicit 
hydrogen atoms except those that have the capability to participate in hydrogen bonds 
("polar hydrogens"). An example of this is the Charmm 19 force-field. 

The polar hydrogens are usually retained in the model, because proper treatment of 
hydrogen bonds requires a reasonably accurate description of the directionality and the 
electrostatic interactions between the donor and acceptor groups. A hydroxyl group, for 
example, can be both a hydrogen bond donor and a hydrogen bond acceptor, and it would 
be impossible to treat this with a single OH pseudo-atom. Note that about half the atoms in 
a protein or nucleic acid are nonpolar hydrogens, so the use of united atoms can provide a 
substantial savings in computer time. 



Examples of applications 

Molecular dynamics is used in many fields of science. 

• First macromolecular MD simulation published (1977, Size: 500 atoms, Simulation Time: 
9.2 ps=0.0092 ns, Program: CHARMM precursor) Protein: Bovine Pancreatic Trypsine 
Inhibitor. This is one of the best studied proteins in terms of folding and kinetics. Its 
simulation published in Nature magazine paved the way for understanding protein 
motion as essential in function and not just accessory J 21 ^ 

• MD is the standard method to treat collision cascades in the heat spike regime, i.e. the 
effects that energetic neutron and ion irradiation have on solids an solid surfaces J 22 ^ ^ 23 ^ 
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The following two biophysical examples are not run-of-the-mill MD simulations. They 
illustrate almost heroic efforts to produce simulations of a system of very large size (a 
complete virus) and very long simulation times (500 microseconds): 

• MD simulation of the complete satellite tobacco mosaic virus (STMV) (2006, Size: 1 
million atoms, Simulation time: 50 ns, program: NAMD) This virus is a small, icosahedral 
plant virus which worsens the symptoms of infection by Tobacco Mosaic Virus (TMV). 
Molecular dynamics simulations were used to probe the mechanisms of viral assembly. 
The entire STMV particle consists of 60 identical copies of a single protein that make up 
the viral capsid (coating), and a 1063 nucleotide single stranded RNA genome. One key 
finding is that the capsid is very unstable when there is no RNA inside. The simulation 
would take a single 2006 desktop computer around 35 years to complete. It was thus 
done in many processors in parallel with continuous communication between themJ 24 ^ 

• Folding Simulations of the Villin Headpiece in All-Atom Detail (2006, Size: 20,000 atoms; 
Simulation time: 500 ]is = 500,000 ns, Program: folding@home) This simulation was run 
in 200,000 CPU's of participating personal computers around the world. These 
computers had the folding@home program installed, a large-scale distributed computing 
effort coordinated by Vijay Pande at Stanford University. The kinetic properties of the 
Villin Headpiece protein were probed by using many independent, short trajectories run 
by CPU's without continuous real-time communication. One technique employed was the 
Pfold value analysis, which measures the probability of folding before unfolding of a 
specific starting conformation. Pfold gives information about transition state structures 
and an ordering of conformations along the folding pathway. Each trajectory in a Pfold 
calculation can be relatively short, but many independent trajectories are neededJ 25 ^ 



Molecular dynamics algorithms 

Integrators 

• Verlet integration 

• Beeman's algorithm 

• Gear predictor - corrector 

• Constraint algorithms (for constrained systems) 

• Symplectic integrator 

Short-range interaction algorithms 

• Cell lists 

• Verlet list 

• Bonded interactions 

Long-range interaction algorithms 

• Ewald summation 

• Particle Mesh Ewald (PME) 

• Particle-Particle Particle Mesh P3M 

• Reaction Field Method 
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Parallelization strategies 

• Domain decomposition method (Distribution of system data for parallel computing) 

• Molecular Dynamics - Parallel Algorithms ^ 26 ^ 

Major software for MD simulations 

• Abalone (classical, implicit water) 

• ABINIT (DFT) 

• ADUN [27] (classical, P2P database for simulations) 

• AMBER (classical) 

• Ascalaph ^ 28 ^ (classical, GPU accelerated) 

• CASTEP (DFT) 

• CPMD (DFT) 

• CP2K [29] (DFT) 

• CHARMM (classical, the pioneer in MD simulation, extensive analysis tools) 

• COSMOS ^ 30 ^ (classical and hybrid QM/MM, quantum-mechanical atomic charges with 
BPT) 

• Desmond ^ 31 ^ (classical, parallelization with up to thousands of CPU's) 

• DLPOLY [32] (classical) 

• ESPResSo (classical, coarse-grained, parallel, extensible) 

• Fireball [33] (tight-binding DFT) 

• GROMACS (classical) 

• GROMOS (classical) 

• GULP (classical) 

• Hippo [34] (classical) 

• LAMMPS (classical, large-scale with spatial-decomposition of simulation domain for 
parallelism) 

• MDynaMix (classical, parallel) 

• MOLDY [35] (classical, parallel) latest release [36] 

• Materials Studio ^ 37 ^ (Forcite MD using COMPASS, Dreiding, Universal, cvff and pcff 
forcefields in serial or parallel, QMERA (QM+MD), ONESTEP (DFT), etc.) 

• MOSCITO (classical) 

• NAMD (classical, parallelization with up to thousands of CPU's) 

• NEWTON-X [38] (ab initio, surface-hopping dynamics) 

• ProtoMol ^ 39 ^ (classical, extensible, includes multigrid electrostatics) 

• PWscf(DFT) 

• S/PHI/nX [40] (DFT) 

• SIESTA (DFT) 

• VASP (DFT) 

• TINKER (classical) 

• YASARA [41] (classical) 

• ORAC [42] (classical) 

• XMD (classical) 
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Related software 

• VMD - MD simulation trajectories can be visualized and analyzed. 

• PyMol - Molecular Visualization software written in python 

• Packmol ^ 43 ^ Package for building starting configurations for MD in an automated fashion 

• Sirius - Molecular modeling, analysis and visualization of MD trajectories 

• esra ^ - Lightweight molecular modeling and analysis library 
Qava/Jython/Mathematica) . 

• Molecular Workbench ^ 45 ^ - Interactive molecular dynamics simulations on your desktop 

• BOSS - MC in OPLS 

Specialized hardware for MD simulations 

• Anton - A specialized, massively parallel supercomputer designed to execute MD 
simulations. 

• MDGRAPE - A special purpose system built for molecular dynamics simulations, 
especially protein structure prediction. 

See also 

• Molecular modeling 

• Computational chemistry 

• Energy drift 

• Force field in Chemistry 

• Force field implementation 

• Monte Carlo method 

• Molecular Design software 

• Molecular mechanics 

• Molecular modeling on GPU 

• Protein dynamics 

• Implicit solvation 

• Car-Parrinello method 

• Symplectic numerical integration 

• Software for molecular mechanics modeling 

• Dynamical systems 

• Theoretical chemistry 

• Statistical mechanics 

• Quantum chemistry 

• Discrete element method 

• List of nucleic acid simulation software 
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[26] http://www.cs.sandia.gov/~sjplimp/md.html 

[27] http://cbbl.imim.es/Adun 

[28] http://www.agilemolecule.com/Products.html 

[29] http://cp2k.berlios.de/ 

[30] http://www.cosmos-software.de/ce_intro.html 

[31] http://www.DEShawResearch.com/resources.html 

[32] http://www.ccp5.ac.uk/DL_P0LY/ 

[33] http://fireball-dft.org 

[34] http://www.biowerkzeug.com/ 

[35] http://www.ccp5.ac.uk/moldy/moldy.html 

[36] http://ccpforge.cse.rl.ac.uk/frs/7group_id =34 

[37] http://accelrys.com/products/materials-studio/ 

[38] http://www.univie.ac.at/newtonx/ 

[39] http://protomol.sourceforge.net/ 

[40] http://www.sphinxlib.de 

[41] http://www.yasara.org 

[42] http://www.chim.unifi.it/orac/ 

[43] http://www.ime.unicamp.br/~martinez/packmol 

[44] http://esra.sourceforge.net/cgi-bin/index.cgi 

[45] http://mw.concord.org/modeler/ 

General references 

• M. P. Allen, D. J. Tildesley (1989) Computer simulation of liquids. Oxford University 
Press. ISBN 0-19-855645-4. 

• J. A. McCammon, S. C. Harvey (1987) Dynamics of Proteins and Nucleic Acids. 
Cambridge University Press. ISBN 0521307503 (hardback). 

• D. C. Rapaport (1996) The Art of Molecular Dynamics Simulation. ISBN 0-521-44561-2. 

• Frenkel, Daan; Smit, Berend (2002) [2001]. Understanding Molecular Simulation : from 
algorithms to applications. San Diego, California: Academic Press. ISBN 0-12-267351-4. 

• J. M. Haile (2001) Molecular Dynamics Simulation: Elementary Methods. ISBN 
0-471-18439-X 

• R. J. Sadus, Molecular Simulation of Fluids: Theory, Algorithms and Object-Orientation, 
2002, ISBN 0-444-51082-6 

• Oren M. Becker, Alexander D. Mackerell Jr, Benoit Roux, Masakatsu Watanabe (2001) 
Computational Biochemistry and Biophysics. Marcel Dekker. ISBN 0-8247-0455-X. 

• Andrew Leach (2001) Molecular Modelling: Principles and Applications. (2nd Edition) 
Prentice Hall. ISBN 978-0582382107. 

• Tamar Schlick (2002) Molecular Modeling and Simulation. Springer. ISBN 
0-387-95404-X. 

• William Graham Hoover (1991) Computational Statistical Mechanics, Elsevier, ISBN 
0-444-88192-1. 



Molecular dynamics 



76 



External links 

• The Blue Gene Project (http://researchweb.watson.ibm.com/bluegene/) (IBM) 

• D. E. Shaw Research (http://deshawresearch.com/) (D. E. Shaw Research) 

• Molecular Physics (http://www.tandf.co.uk/journals/titles/00268976.asp) 

• Statistical mechanics of Nonequilibrium Liquids (http://www.phys.unsw.edu.au/ 
~gary/book.html) Lecture Notes on non-equilibrium MD 

• Introductory Lecture on Classical Molecular Dynamics (http://www.fz-juelich.de/ 
nic-series/volumelO/sutmann.pdf) by Dr. Godehard Sutmann, NIC, Forschungszentrum 
Jiilich, Germany 

• Introductory Lecture on Ab Initio Molecular Dynamics and Ab Initio Path Integrals (http:/ 
/www. fz-juelich.de/nic-series/volumel0/tuckerman2.pdf) by Mark E. Tuckerman, New 
York University, USA 

• Introductory Lecture on Ab initio molecular dynamics: Theory and Implementation (http:/ 
/www.fz-juelich.de/nic-series/Volumel/marx.pdf) by Dominik Marx, Ruhr-Universitat 
Bochum and Jiirg Hutter, Universitat Zurich 

• Atomic-scale Friction Research and Education Synergy Hub (AFRESH) (http://nsfafresh. 
org) an Engineering Virtual Organization for the atomic-scale friction community to 
share, archive, link, and discuss data, knowledge and tools related to atomic-scale 
friction. 

• AFRESH (http://nsfafresh.org/wiki/index. php?title=Computational_Tribology) also 
provides detailed information regarding computational methods such as Molecular 
Dynamics as it relates to atomic-scale friction research. 

DNA Dynamics 

DNA Molecular dynamics modeling involves simulations of -> DNA molecular geometry 
and topology changes with time as a result of both intra- and inter- molecular interactions 
of DNA. Whereas molecular models of Deoxyribonucleic acid (-> DNA) molecules such as 
closely packed spheres (CPK models) made of plastic or metal wires for 'skeletal models' 
are useful representations of static DNA structures, their usefluness is very limited for 
representing complex DNA dynamics. Computer molecular modeling allows both 
animations and molecular dynamics simulations that are very important for understanding 
how DNA functions in vivo. 

An old standing dynamic problem is how DNA "self-replication" takes place in living cells 
that should involve transient uncoiling of supercoiled DNA fibers. Altough DNA consists of 
relatively rigid, very large elongated biopolymer molecules called "fibers" or chains its 
molecular stucture in vivo undergoes dynamic configuration changes that involve 
dynamically attached water molecules, ions or proteins/enzymes. Supercoiling, packing 
with histones in chromosome structures, and other such supramolecular aspects also 
involve in vivo DNA topology which is even more complex than DNA molecular geometry, 
thus turning molecular modeling of DNA dynamics into a series of challenging problems for 
biophysical chemists, molecular biologists and biotechnologists. Thus, DNA exists in 
multiple stable geometries (called conformational isomerism) and has a rather large 
number of configurational, quantum states which are close to each other in energy on the 
potential energy surface of the DNA molecule. 
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Such varying molecular geometries can also be computed, at least in principle, by 
employing ab initio quantum chemistry methods that can attain high accuracy for small 
molecules, although claims that acceptable accuracy can be also achieved for 
polynucleotides, as well as DNA conformations, were recenlty made on the basis of VCD 
spectral data. Such quantum geometries define an important class of ab initio molecular 
models of DNA whose exploration has barely started especially in connection with results 
obtained by VCD in solutions. More detailed comparisons with such ab initio quantum 
computations are in principle obtainable through 2D-FT NMR spectroscopy and relaxation 
studies of polynucleotide solutions or specifically labeled DNA, as for example with 
deuterium labels. 



Importance of DNA molecular structure and dynamics 
modeling for Genomics and beyond 

From the very early stages of structural studies of DNA by X-ray diffraction and 
biochemical means, molecular models such as the Watson-Crick double-helix model were 
succesfully employed to solve the 'puzzle' of DNA structure, and also find how the latter 
relates to its key functions in living cells. The first high quality X-ray diffraction patterns of 
A-DNA were reported by Rosalind Franklin and Raymond Gosling in 1953^ . The first 
reports of a double-helix molecular model of B-DNA structure were made by Watson and 
Crick in 1953 [2] [3] . Then Maurice F. Wilkins, A. Stokes and H.R. Wilson, reported the first 
X-ray patterns of in vivo B-DNA in partially oriented salmon sperm heads ^ . The 
development of the first correct double-helix molecular model of DNA by Crick and Watson 
may not have been possible without the biochemical evidence for the nucleotide 
base-pairing ([A— T] ; [C— G]), or Chargaff's rules [7] [8] [9] [10] [12] . Although such initial 
studies of DNA structures with the help of molecular models were essentially static, their 
consequences for explaining the in vivo functions of DNA were significant in the areas of 
protein biosynthesis and the quasi-universality of the genetic code. Epigenetic 
transformation studies of DNA in vivo were however much slower to develop in spite of 
their importance for embryology, morphogenesis and cancer research. Such chemical 
dynamics and biochemical reactions of DNA are much more complex than the molecular 
dynamics of DNA physical interactions with water, ions and proteins/enzymes in living cells. 



Animated DNA molecular models and hydrogen-bonding 

Animated molecular models allow one to visually explore the three-dimensional (3D) 
structure of DNA. The first DNA model is a space-filling, or CPK, model of the DNA 
double-helix whereas the third is an animated wire, or skeletal type, molecular model of 
DNA. The last two DNA molecular models in this series depict quadruplex DNA ^ that may 
be involved in certain cancers^ ^ . The first CPK model in the second row is a molecular 
model of hydrogen bonds between water molecules in ice that are broadly similar to those 
found in DNA; the hydrogen bonding dynamics and proton exchange is however very 
different by many orders of magnitude between the two systems of fully hydrated DNA and 
water molecules in ice. Thus, the DNA dynamics is complex involving nanosecond and 
several tens of picosecond time scales, whereas that of liquid ice is on the picosecond time 
scale, and that of proton exchange in ice is on the millisecond time scale; the proton 
exchange rates in DNA and attached proteins may vary from picosecond to nanosecond, 
minutes or years, depending on the exact locations of the exchanged protons in the large 
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biopolymers. The simple harmonic oscillator 'vibration' in the third, animated image of the 
next gallery is only an oversimplified dynamic representation of the longitudinal vibrations 
of the DNA intertwined helices which were found to be anharmonic rather than harmonic as 
often assumed in quantum dynamic simulations of DNA. 




A-DNA B-DNA 
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Human Genomics and Biotechnology Applications of DNA 
Molecular Modeling 

The following two galleries of images illustrates various uses of DNA molecular modeling in 
Genomics and Biotechnology research applications from DNA repair to PCR and DNA 
nanostructures; each slide contains its own explanation and/or details. The first slide 
presents an overview of DNA applications, including DNA molecular models, with emphasis 
on Genomics and Biotechnology. 

Applications of DNA molecular dynamics computations 

• First row images present a DNA biochip and DNA nanostructures designed for DNA 
computing and other dynamic applications of DNA nanotechnology; last image in this row 
is of DNA arrays that display a representation of the Sierpinski gasket on their surfaces. 

• Second row: the first two images show computer molecular models of RNA polymerase, 
followed by that of an E. coli, bacterial DNA primase template suggesting very complex 
dynamics at the interfaces between the enzymes and the DNA template; the fourth image 
illustrates in a computed molecular model the mutagenic, chemical interaction of a 
potent carcinogen molecule with DNA, and the last image shows the different 
interactions of specific fluorescence labels with DNA in human and orangoutan 
chromosomes. 
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Image Gallery: DNA Applications and Technologies at various scales 
in Biotechnology and Genomics research 

The first figure is an actual electron micrograph of a DNA fiber bundle, presumably of a 
single plasmid, bacterial DNA loop. 




Exponential growth of short product 
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Databases for Genomics, DNA Dynamics and Sequencing 



Genomic and structural databases 

• CBS Genome Atlas Database ^ — contains examples of base skews J 60 ^ 

• The Z curve database of genomes — a 3-dimensional visualization and analysis tool of 
genomes . 

• DNA and other nucleic acids' molecular models: Coordinate files of nucleic acids 



molecular structure models in PDB and CIF formats 



[10] 



Mass spectrometry— Maldi informatics 



Data acquisition 




Nudeic acid 
quantitation 



Genotype, 
mutations, etc. 
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DNA Dynamics Data from Spectroscopy 

• FT-NMR [11] [12] 

• NMR Atlas-database [13] 

• mmcif downloadable coordinate files of nucleic acids in solution from 2D-FT NMR data 

[14] 

• NMR constraints files for NAs in PDB format [15] 
NMR microscopy [16] 
-> Vibrational circular dichroism (VCD) 
Microwave spectroscopy 
FT-IR 

FT-NIR [17] [18] [19] 

Spectral, Hyperspectral, and -> Chemical imaging) [20] [21] [22] [23] [24] [25] [26] . 
Raman spectroscopy/microscopy^ 27 ^ and CARS^ 28 ^ . 

Fluorescence correlation spectroscopy^ 29 ^ ^ 30 ^ ^ 31 ^ ^ 32 ^ ^ 33 ^ ^ 34 ^ ^ 35 ^ ^ 36 ^ , Fluorescence 
cross-correlation spectroscopy and FRET^ 37 ^ ^ 38 ^ ^ 39 ^ . 

• Confocal microscopy^ 40 ^ 



Gallery: CARS (Raman spectroscopy), Fluorescence confocal 
microscopy, and Hyperspectral imaging 



einfallende 
Strahlung: 



Raman-Medium: 



verlassende 
Strahlung: 



erspectrai Comparison 
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X-ray microscopy 

• Application of X-ray microscopy in the analysis of living hydrated cells ^ 41 ^ 
Atomic Force Microscopy (AFM) 

Two-dimensional DNA junction arrays have been visualized by Atomic Force Microscopy 
(AFM)^ 42 ^ . Other imaging resources for AFM/Scanning probe microscopy(SPM) can be 
freely accessed at: 

• How SPM Works [43] 

• SPM Image Gallery - AFM STM SEM MFM NSOM and more. [44] 
Gallery of AFM Images of DNA Nanostructures 
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• DNA nanotechnology 

• Imaging 

• Sirius visualization software 

• Atomic force microscopy 

• X-ray microscopy 

• Liquid crystals 

• Glasses 

• QMC@Home 

• Sir Lawrence Bragg, FRS 

• Sir John Randall 

• Francis Crick 

• Manfred Eigen 

• Felix Bloch 

• Paul Lauterbur 

• Maurice Wilkins 

• Herbert Wilson, FRS 

• Alex Stokes 

External links 

• DNA the Double Helix Game (http://nobelprize.org/educational_games/medicine/ 
dna_double_helix/) From the official Nobel Prize web site 

• MDDNA: Structural Bioinformatics of DNA (http://humphry.chem. wesleyan.edu:8080/ 
MDDNA/) 

• Double Helix 1953-2003 (http://www.ncbe.reading.ac.uk/DNA50/) National Centre 
for Biotechnology Education 

• DNA under electron microscope (http://www.fidelitysystems.com/Unlinked_DNA.html) 

• Ascalaph DNA (http://www.agilemolecule.com/Ascalaph/Ascalaph_DNA.html) — 
Commercial software for DNA modeling 

• DNAlive: a web interface to compute DNA physical properties (http://mmb.pcb.ub.es/ 
DNAIive). Also allows cross-linking of the results with the UCSC Genome browser and 
DNA dynamics. 

• DiProDB: Dinucleotide Property Database (http://diprodb.fli-leibniz.de). The database is 
designed to collect and analyse thermodynamic, structural and other dinucleotide 
properties. 

• Further details of mathematical and molecular analysis of DNA structure based on X-ray 
data (http://planetphysics.org/encyclopedia/ 
BesselFunctionsApplicationsToDiffractionByHelicalStructures.html) 

• Bessel functions corresponding to Fourier transforms of atomic or molecular helices. 
(http://planetphysics.org/?op=getobj&from=objects& 

name=BesselFunctionsAndTheirApplicationsToDiffractionByHelicalStructures) 

• Application of X-ray microscopy in analysis of living hydrated cells (http://www.ncbi. 
nlm.nih.gov/entrez/query.fcgi?cmd = Retrieve&db=pubmed&dopt=Abstract& 
list_uids=12379938) 
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• Characterization in nanotechnology some pdfs (http://nanocharacterization.sitesled. 
com/) 

• An overview of STM/AFM/SNOM principles with educative videos (http://www.ntmdt. 
ru/SPM-Techniques/Principles/) 

• SPM Image Gallery - AFM STM SEM MFM NSOM and More (http://www.rhk-tech.com/ 
results/showcase, php) 

• How SPM Works (http://www.parkafm.com/New_html/resources/01general.php) 

• U.S. National DNA Day (http://www.genome.gov/10506367) — watch videos and 
participate in real-time discusssions with scientists. 

• The Secret Life of DNA - DNA Music compositions (http://www.tjmitchell.com/stuart/ 
dna.html) 

2D-FT NMRI and Spectroscopy 

2D-FT Nuclear magnetic resonance imaging (2D-FT NMRI), or Two-dimensional 
Fourier transform nuclear magnetic resonance imaging (NMRI), is primarily a 
non— invasive imaging technique most commonly used in biomedical research and medical 
radiology/nuclear medicine/MRI to visualize structures and functions of the living systems 
and single cells. For example it can provides fairly detailed images of a human body in any 
selected cross-sectional plane, such as longitudinal, transversal, sagital, etc. The basic 
NMR phenomenon or physical principle^ is essentially the same in N(MRI), nuclear 
magnetic resonance/FT (NMR) spectroscopy, topical NMR, or even in Electron Spin 
Resonance /EPR; however, the details are significantly different at present for EPR, as only 
in the early days of NMR the static magnetic field was scanned for obtaining spectra, as it 
is still the case in many EPR or ESR spectrometers. NMRI, on the other hand, often utilizes 
a linear magnetic field gradient to obtain an image that combines the visualization of 
molecular structure and dynamics. It is this dynamic aspect of NMRI, as well as its highest 
sensitivity for the H nucleus that distinguishes it very dramatically from X-ray CAT 
scanning that 'misses' hydrogens because of their very low X-ray scattering factor. 

Thus, NMRI provides much greater contrast especially for the different soft tissues of the 
body than computed tomography (CT) as its most sensitive option observes the nuclear spin 
distribution and dynamics of highly mobile molecules that contain the naturally abundant, 
stable hydrogen isotope H as in plasma water molecules, blood, disolved metabolites and 
fats. This approach makes it most useful in cardiovascular, oncological (cancer), 
neurological (brain), musculoskeletal, and cartilage imaging. Unlike CT, it uses no ionizing 
radiation, and also unlike nuclear imaging it does not employ any radioactive isotopes. 
Some of the first MRI images reported were published in 1973^ and the first study 
performed on a human took place on July 3, 1977 Earlier papers were also published by 
Sir Peter Mansfield [4] in UK (Nobel Laureate in 2003), and R. Damadian in the USA [5] , 
(together with an approved patent for 'fonar', or magnetic imaging). The detailed physical 
theory of NMRI was published by Peter Mansfield in 1973^ . Unpublished 'high-resolution' 
(50 micron resolution) images of other living systems, such as hydrated wheat grains, were 
also obtained and communicated in UK in 1977-1979, and were subsequently confirmed by 
articles published in Nature by Peter Callaghan. 
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NMR Principle 



Certain nuclei such as H 
nuclei, or "fermions 1 have 
spin-1/2, because there are 
two spin states, referred to as 
"up" and "down" states. The 
nuclear magnetic resonance 
absorption phenomenon 
occurs when samples 
containing such nuclear spins 
are placed in a static magnetic 
field and a very short 
radiofrequency pulse is 
applied with a center, or 
carrier, frequency matching 
that of the transition between 
the up and down states of the 

spin-1/2 H nuclei that were polarized by the static magnetic field. L J Very low field 
schemes have also been recently reported J 8 ^ 



m 




Advanced 4.7 T clinical diagnostics and biomedical research NMR 
Imaging instrument. 



Chemical Shifts 

NMR is a very useful family of techniques for chemical and biochemical research because 
of the chemical shift; this effect consists in a frequency shift of the nuclear magnetic 
resonance for specific chemical groups or atoms as a result of the partial shielding of the 
corresponding nuclei from the applied, static external magnetic field by the electron 
orbitals (or molecular orbitals) surrounding such nuclei present in the chemical groups. 
Thus, the higher the electron density surrounding a specific nucleus the larger the chemical 
shift will be. The resulting magnetic field at the nucleus is thus lower than the applied 
external magnetic field and the resonance frequencies observed as a result of such 
shielding are lower than the value that would be observed in the absence of any electronic 
orbital shielding. Furthermore, in order to obtain a chemical shift value independent of the 
strength of the applied magnetic field and allow for the direct comparison of spectra 
obtained at different magnetic field values, the chemical shift is defined by the ratio of the 
strength of the local magnetic field value at the observed (electron orbital-shielded) nucleus 
by the external magnetic field strength, H, / H . The first NMR observations of the 

lOC U i n 

chemical shift, with the correct physical chemistry interpretation, were reported for F 
containing compounds in the early 1950s by Herbert S. Gutowsky and Charles P. Slichter 
from the University of Illinois at Urbana (USA). 

A related effect in metals is called the Knight shift, which is due only to the conduction 
electrons. Such conduction electrons present in metals induce an "additional" local field at 
the nuclear site, due to the spin re-orientation of the conduction electrons in the presence 
of the applied (constant), external magnetic field. This is only broadly N similar' to the 
chemical shift in either solutions or diamagnetic solids. 
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NMR Imaging Principles 

A number of methods have been devised for combining magnetic field gradients and 
radiofrequency pulsed excitation to obtain an image. Two major maethods involve either 2D 
-FT or 3D-FT^ reconstruction from projections, somewhat similar to Computed 
Tomography, with the exception of the image interpretation that in the former case must 
include dynamic and relaxation/contrast enhancement information as well. Other schemes 
involve building the NMR image either point-by-point or line-by-line. Some schemes use 
instead gradients in the rf field rather than in the static magnetic field. The majority of 
NMR images routinely obtained are either by the Two-Dimensional Fourier Transform 
(2D-FT) technique ^ 10 ^ (with slice selection), or by the Three-Dimensional Fourier Transform 
(3D— FT) techniques that are however much more time consuming at present. 2D-FT NMRI 
is sometime called in common parlance a "spin-warp". An NMR image corresponds to a 
spectrum consisting of a number of "spatial frequencies' at different locations in the sample 
investigated, or in a patientJ 11 ^ A two-dimensional Fourier transformation of such a "real" 
image may be considered as a representation of such "real waves" by a matrix of spatial 
frequencies known as the k-space. We shall see next in some mathematical detail how the 
2D-FT computation works to obtain 2D-FT NMR images. 

Two-dimensional Fourier transform imaging and 
spectroscopy 

A two-dimensional Fourier transform (2D-FT) is computed numerically or carried out in two 
stages, both involving "standard 1 , one-dimensional Fourier transforms. However, the 
second stage Fourier transform is not the inverse Fourier transform (which would result in 
the original function that was transformed at the first stage), but a Fourier transform in a 
second variable— which is "shifted 1 in value— relative to that involved in the result of the 
first Fourier transform. Such 2D-FT analysis is a very powerful method for both NMRI and 
two-dimensional nuclear magnetic resonance spectroscopy (2D-FT NMRS)^ 12 ^ that allows 
the three-dimensional reconstruction of polymer and biopolymer structures at atomic 
resolutions 13 ^ for molecular weights (Mw) of dissolved biopolymers in aqueous solutions 
(for example) up to about 50,000 Mw. For larger biopolymers or polymers, more complex 
methods have been developed to obtain limited structural resolution needed for partial 
3D-reconstructions of higher molecular structures, e.g. for up 900,000 Mw or even oriented 
microcrystals in aqueous suspensions or single crystals; such methods have also been 
reported for in vivo 2D-FT NMR spectroscopic studies of algae, bacteria, yeast and certain 
mammalian cells, including human ones. The 2D-FT method is also widely utilized in optical 
spectroscopy, such as 2D-FT NIR hyperspectral imaging (2D-FT NIR-HS), or in MRI 
imaging for research and clinical, diagnostic applications in Medicine. In the latter case, 
2D-FT NIR-HS has recently allowed the identification of single, malignant cancer cells 
surrounded by healthy human breast tissue at about 1 micron resolution, well-beyond the 
resolution obtainable by 2D-FT NMRI for such systems in the limited time available for such 
diagnostic investigations (and also in magnetic fields up to the FDA approved magnetic 
field strength H Q of 4.7 T, as shown in the top image of the state-of-the-art NMRI 
instrument). A more precise mathematical definition of the "double 1 (2D) Fourier transform 
involved in both 2D NMRI and 2D-FT NMRS is specified next, and a precise example 
follows this generally accepted definition. 
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2D- FT Definition 

A 2D-FT, or two-dimensional Fourier transform, is a standard Fourier transformation of a 
function of two variables, f(x r x 2 ), carried first in the first variable followed by the 
Fourier transform in the second variable x 2 of the resulting function F(s 1 ,x 2 ). Note that in 
the case of both 2D-FT NMRI and 2D-FT NMRS the two independent variables in this 
definition are in the time domain, whereas the results of the two successive Fourier 
transforms have, of course, frequencies as the independent variable in the NMRS, and 
ultimately spatial coordinates for both 2D NMRI and 2D-FT NMRS following computer 
structural recontructions based on special algorithms that are different from FT or 2D-FT. 
Moreover, such structural algorithms are different for 2D NMRI and 2D-FT NMRS: in the 
former case they involve macroscopic, or anatomical structure detrmination, whereas in the 
latter case of 2D-FT NMRS the atomic structure reconstruction algorithms are based on the 
quantum theory of a microphysical (quantum) process such as nuclear Overhauser 
enhancement NOE, or specific magnetic dipole-dipole interactions^ 14 ^ between neighbor 
nuclei. 

Example 1 

A 2D Fourier transformation and phase correction is applied to a set of 2D NMR (FID) 
signals: s(t x ,t 2 ) yielding a real 2D-FT NMR 'spectrum' (collection of ID FT-NMR spectra) 
represented by a matrix S whose elements are 

S (1/1,1*2) = Re // ™5(/y^i)ea:p c " ii/2i2) 5(t 1? t 2 )^i^ 2 

where : and : ;/ 2 denote the discrete indirect double-quantum and 

single-quantum(detection) axes, respectively, in the 2D NMR experiments. Next, the 
covariance matrix is calculated in the frequency domain according to the following equation 

c (j4,v 2 ) = s t s = Y\s{v u v' 2 )s{w 2 )], with . ^ taking all possible 

single-quantum frequency values and with the summation carried out over all discrete, 
double quantum frequencies : 7y i . 

Example 2 

Atomic Structure from 2D-FT STEM Images ^ 15 ^ of electron distributions in a 
high-temperature cuprate superconductor "paracrystal 1 reveal both the domains (or 
"location 1 ) and the local symmetry of the 'pseudo-gap' in the electron-pair correlation band 
responsible for the high— temperature superconductivity effect (obtained at Cornell 
University). So far there have been three Nobel prizes awarded for 2D-FT NMR/MRI during 
1992-2003, and an additional, earlier Nobel prize for 2D-FT of X-ray data fCAT scans'); 
recently the advanced possibilities of 2D-FT techniques in Chemistry, Physiology and 
Medicine ^ 16 ^ received very significant recognition^ 17 ^ 

Brief explanation of NMRI diagnostic uses in Pathology 

As an example, a diseased tissue such as a malign tumor, can be detected by 2D-FT NMRI 
because the hydrogen nuclei of molecules in different tissues return to their equilibrium 
spin state at different relaxation rates, and also because of the manner in which a malign 
tumor spreads and grows rapidly along the blood vessels adjacent to the tumor, also 
inducing further vascularization to occur. By changing the pulse delays in the RF pulse 
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sequence employed, and/or the RF pulse sequence itself, one may obtain a 
v relaxation— based contrast 1 , or contrast enhancement between different types of body 
tissue, such as normal vs. diseased tissue cells for example. Excluded from such diagnostic 
observations by NMRI are all patients with ferromagnetic metal implants, (e.g., cochlear 
implants), and all cardiac pacemaker patients who cannot undergo any NMRI scan because 
of the very intense magnetic and RF fields employed in NMRI which would strongly 
interfere with the correct functioning of such pacemakers. It is, however, conceivable that 
future developments may also include along with the NMRI diagnostic treatments with 
special techniques involving applied magnetic fields and very high frequency RF. Already, 
surgery with special tools is being experimented on in the presence of NMR imaging of 
subjects. Thus, NMRI is used to image almost every part of the body, and is especially useful 
for diagnosis in neurological conditions, disorders of the muscles and joints, for evaluating 
tumors, such as in lung or skin cancers, abnormalities in the heart (especially in children 
with hereditary disorders), blood vessels, CAD, atherosclerosis and cardiac infarcts ^ 18 ^ 
(courtesy of Dr. Robert R. Edelman) 

See also 
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External links 

• Cardiac Infarct or "heart attack" Imaged in Real Time by 2D-FT NMRI (http://www. 
mr-tip.com/exam_gifs/cardiac_infarct_short_axis_cine_6.gif) 

• Interactive Flash Animation on MRI (http://www.e-mri.org) - Online Magnetic 
Resonance Imaging physics and technique course 

• Herbert S. Gutowsky 

• Jiri Jonas and Charles P. Slichter: NMR Memoires at NAS about Herbert Sander 
Gutowsky; NAS = National Academy of Sciences, USA, (http://books.nap.edu/html/ 
biomems/hgutowsky.pdf) 

• 3D Animation Movie about MRI Exam (http://www.patiencys.com/MRI/) 

• International Society for Magnetic Resonance in Medicine (http://www.ismrm.org) 

• Danger of objects flying into the scanner (http://www.simplyphysics.com/ 
flying_objects.html) 

Related Wikipedia websites 

• Medical imaging 

• Computed tomography 

• Magnetic resonance microscopy 

• Fourier transform spectroscopy 

• FT-NIRS 

• Magnetic resonance elastography 

• Nuclear magnetic resonance (NMR) 

• Chemical shift 

• Relaxation 

• Robinson oscillator 

• Earth's field NMR (EFNMR) 

• Rabi cycle 

This article incorporates material by the original author from 2D-FT MR- Imaging and 
related Nobel awards (http:/ / planetphysics. org/ encyclopedia/ 2DFTImaging. html) on 
PlanetPhysics (http://planetphysics.org/), which is licensed under the GFDL. 
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Chemical imaging 



Chemical imaging is the simultaneous measurement of spectra (chemical information) 
and images or pictures (spatial information)^ ^ The technique is most often applied to 
either solid or gel samples, and has applications in chemistry, biology^ ^ ^ ^ ^ ^ , 
medicine^ ^ 10 ^ , pharmacy^ 11 ^ (see also for example: Chemical Imaging Without Dyeing 
^ 12 b, food science, biotechnology^ 13 ^ ^ 14 ^ , agriculture and industry (see for example:NIR 
Chemical Imaging in Pharmaceutical Industry ^ 15 ^ and Pharmaceutical Process Analytical 
Technology: ^ 16 b. NIR, IR and Raman chemical imaging is also referred to as hyperspectral, 
spectroscopic, spectral or multispectral imaging (also see microspectroscopy). However, 
other ultra-sensitive and selective, chemical imaging techniques are also in use that involve 
either UV-visible or fluorescence microspectroscopy. Chemical imaging techniques can be 
used to analyze samples of all sizes, from the single molecule^ 17 ^ ^ 18 ^ to the cellular level in 
biology and medicine^ 19 ^ ^ 20 ^ ^ 21 ^ , and to images of planetary systems in astronomy, but 
different instrumentation is employed for making observations on such widely different 
systems. 

Chemical imaging instrumentation is composed of three components: a radiation source to 
illuminate the sample, a spectrally selective element, and usually a detector array (the 
camera) to collect the images. When many stacked spectral channels (wavelengths) are 
collected for different locations of the microspectrometer focus on a line or planar array in 
the focal plane, the data is called hyperspectral; fewer wavelength data sets are called 
multispectral. The data format is called a hypercube. The data set may be visualized as a 
three-dimensional block of data spanning two spatial dimensions (x and y), with a series of 
wavelengths (lambda) making up the third (spectral) axis. The hypercube can be visually 
and mathematically treated as a series of spectrally resolved images (each image plane 
corresponding to the image at one wavelength) or a series of spatially resolved spectra. The 
analyst may choose to view the spectrum measured at a particular spatial location; this is 
useful for chemical identification. Alternatively, selecting an image plane at a particular 
wavelength can highlight the spatial distribution of sample components, provided that their 
spectral signatures are different at the selected wavelength. 

Many materials, both manufactured and naturally occurring, derive their functionality from 
the spatial distribution of sample components. For example, extended release 
pharmaceutical formulations can be achieved by using a coating that acts as a barrier layer. 
The release of active ingredient is controlled by the presence of this barrier, and 
imperfections in the coating, such as discontinuities, may result in altered performance. In 
the semi-conductor industry, irregularities or contaminants in silicon wafers or printed 
micro-circuits can lead to failure of these components. The functionality of biological 
systems is also dependent upon chemical gradients - a single cell, tissue, and even whole 
organs function because of the very specific arrangement of components. It has been shown 
that even small changes in chemical composition and distribution may be an early indicator 
of disease. 

Any material that depends on chemical gradients for functionality may be amenable to 
study by an analytical technique that couples spatial and chemical characterization. To 
efficiently and effectively design and manufacture such materials, the 'what' and the 
'where' must both be measured. The demand for this type of analysis is increasing as 
manufactured materials become more complex. Chemical imaging techniques not only 
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permit visualization of the spatially resolved chemical information that is critical to 
understanding modern manufactured products, but it is also a non-destructive technique so 
that samples are preserved for further testing. 

History 

Commercially available laboratory-based chemical imaging systems emerged in the early 
1990s (ref. 1-5). In addition to economic factors, such as the need for sophisticated 
electronics and extremely high-end computers, a significant barrier to commercialization of 
infrared imaging was that the focal plane array (FPA) needed to read IR images were not 
readily available as commercial items. As high-speed electronics and sophisticated 
computers became more commonplace, and infrared cameras became readily commercially 
available, laboratory chemical imaging systems were introduced. 

Initially used for novel research in specialized laboratories, chemical imaging became a 
more commonplace analytical technique used for general R&D, quality assurance (QA) and 
quality control (QC) in less than a decade. The rapid acceptance of the technology in a 
variety of industries (pharmaceutical, polymers, semiconductors, security, forensics and 
agriculture) rests in the wealth of information characterizing both chemical composition 
and morphology. The parallel nature of chemical imaging data makes it possible to analyze 
multiple samples simultaneously for applications that require high throughput analysis in 
addition to characterizing a single sample. 

Principles 

Chemical imaging shares the fundamentals of vibrational spectroscopic techniques, but 
provides additional information by way of the simultaneous acquisition of spatially resolved 
spectra. It combines the advantages of digital imaging with the attributes of spectroscopic 
measurements. Briefly, vibrational spectroscopy measures the interaction of light with 
matter. Photons that interact with a sample are either absorbed or scattered; photons of 
specific energy are absorbed, and the pattern of absorption provides information, or a 
fingerprint, on the molecules that are present in the sample. 

On the other hand, in terms of the observation setup, chemical imaging can be carried out 
in one of the following modes: (optical) absorption, emission (fluorescence), (optical) 
transmission or scattering (Raman). A consensus currently exists that the fluorescence 
(emission) and Raman scattering modes are the most sensitive and powerful, but also the 
most expensive. 

In a transmission measurement, the radiation goes through a sample and is measured by a 
detector placed on the far side of the sample. The energy transferred from the incoming 
radiation to the molecule(s) can be calculated as the difference between the quantity of 
photons that were emitted by the source and the quantity that is measured by the detector. 
In a diffuse reflectance measurement, the same energy difference measurement is made, 
but the source and detector are located on the same side of the sample, and the photons 
that are measured have re-emerged from the illuminated side of the sample rather than 
passed through it. The energy may be measured at one or multiple wavelengths; when a 
series of measurements are made, the response curve is called a spectrum. 

A key element in acquiring spectra is that the radiation must somehow be energy selected - 
either before or after interacting with the sample. Wavelength selection can be 
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accomplished with a fixed filter, tunable filter, spectrograph, an interferometer, or other 
devices. For a fixed filter approach, it is not efficient to collect a significant number of 
wavelengths, and multispectral data are usually collected. Interferometer-based chemical 
imaging requires that entire spectral ranges be collected, and therefore results in 
hyperspectral data. Tunable filters have the flexibility to provide either multi- or 
hyperspectral data, depending on analytical requirements. 

Spectra may be measured one point at a time using a single element detector (single-point 
mapping), as a line-image using a linear array detector (typically 16 to 28 pixels) (linear 
array mapping), or as a two-dimensional image using a Focal Plane Array (FPA) (typically 
256 to 16,384 pixels) (FPA imaging). For single-point the sample is moved in the x and y 
directions point-by-point using a computer-controlled stage. With linear array mapping, the 
sample is moved line-by-line with a computer-controlled stage. FPA imaging data are 
collected with a two-dimensional FPA detector, hence capturing the full desired 
field-of-view at one time for each individual wavelength, without having to move the 
sample. FPA imaging, with its ability to collected tens of thousands of spectra 
simultaneously is orders of magnitude faster than linear arrays which are can typically 
collect 16 to 28 spectra simultaneously, which are in turn much faster than single-point 
mapping. 

Terminology 

Some words common in spectroscopy, optical microscopy and photography have been 
adapted or their scope modified for their use in chemical imaging. They include: resolution, 
field of view and magnification. There are two types of resolution in chemical imaging. The 
spectral resolution refers to the ability to resolve small energy differences; it applies to the 
spectral axis. The spatial resolution is the minimum distance between two objects that is 
required for them to be detected as distinct objects. The spatial resolution is influenced by 
the field of view, a physical measure of the size of the area probed by the analysis. In 
imaging, the field of view is a product of the magnification and the number of pixels in the 
detector array. The magnification is a ratio of the physical area of the detector array 
divided by the area of the sample field of view. Higher magnifications for the same detector 
image a smaller area of the sample. 

Types of vibrational chemical imaging instruments 

Chemical imaging has been implemented for mid-infrared, near-infrared spectroscopy and 
Raman spectroscopy. As with their bulk spectroscopy counterparts, each imaging technique 
has particular strengths and weaknesses, and are best suited to fulfill different needs. 

Mid- infrared chemical imaging 

Mid-infrared (MIR) spectroscopy probes fundamental molecular vibrations, which arise in 
the spectral range 2,500-25,000 nm. Commercial imaging implementations in the MIR 
region typically employ Fourier Transform Infrared (FT-IR) interferometers and the range is 
more commonly presented in wavenumber, 4,000 - 400 cm" 1 . The MIR absorption bands 
tend to be relatively narrow and well-resolved; direct spectral interpretation is often 
possible by an experienced spectroscopist. MIR spectroscopy can distinguish subtle 
changes in chemistry and structure, and is often used for the identification of unknown 
materials. The absorptions in this spectral range are relatively strong; for this reason, 
sample presentation is important to limit the amount of material interacting with the 
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incoming radiation in the MIR region. Most data collected in this range is collected in 
transmission mode through thin sections ( — 10 micrometres) of material. Water is a very 
strong absorber of MIR radiation and wet samples often require advanced sampling 
procedures (such as attenuated total reflectance). Commercial instruments include point 
and line mapping, and imaging. All employ an FT-IR interferometer as wavelength selective 
element and light source. 




Remote chemical imaging of a simultaneous release of SF and 
NH 3 at 1.5km using the FIRST imaging spectrometer 



For types of MIR microscope, see 
Microscopy#infrared microscopy. 

Atmospheric windows in the 
infrared spectrum are also 
employed to perform chemical 
imaging remotely. In these 
spectral regions the atmospheric 
gases (mainly water and C0 2 ) 
present low absorption and allow 
infrared viewing over kilometer 
distances. Target molecules can then be viewed using the selective absorption/emission 
processes described above. An example of the chemical imaging of a simultaneous release 
of SF 6 and NH 3 is shown in the image. 

Near- infrared chemical imaging 

The analytical near infrared (NIR) region spans the range from approximately 700-2,500 
nm. The absorption bands seen in this spectral range arise from overtones and combination 
bands of O-H, N-H, C-H and S-H stretching and bending vibrations. Absorption is one to two 
orders of magnitude smaller in the NIR compared to the MIR; this phenomenon eliminates 
the need for extensive sample preparation. Thick and thin samples can be analyzed without 
any sample preparation, it is possible to acquire NIR chemical images through some 
packaging materials, and the technique can be used to examine hydrated samples, within 
limits. Intact samples can be imaged in transmittance or diffuse reflectance. 

The lineshapes for overtone and combination bands tend to be much broader and more 
overlapped than for the fundamental bands seen in the MIR. Often, multivariate methods 
are used to separate spectral signatures of sample components. NIR chemical imaging is 
particularly useful for performing rapid, reproducible and non-destructive analyses of 
known materials^ 23 ^ ^ 24 ^ . NIR imaging instruments are typically based on one of two 
platforms: imaging using a tunable filter and broad band illumination, and line mapping 
employing an FT-IR interferometer as the wavelength filter and light source. 

Raman chemical imaging 

The Raman shift chemical imaging spectral range spans from approximately 50 to 4,000 
i 

cm" ; the actual spectral range over which a particular Raman measurement is made is a 
function of the laser excitation frequency. The basic principle behind Raman spectroscopy 
differs from the MIR and NIR in that the x-axis of the Raman spectrum is measured as a 
function of energy shift (in cm ) relative to the frequency of the laser used as the source of 
radiation. Briefly, the Raman spectrum arises from inelastic scattering of incident photons, 
which requires a change in polarizability with vibration, as opposed to infrared absorption, 
which requires a change in dipole moment with vibration. The end result is spectral 
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information that is similar and in many cases complementary to the MIR. The Raman effect 
is weak - only about one in 10 7 photons incident to the sample undergoes Raman scattering. 
Both organic and inorganic materials possess a Raman spectrum; they generally produce 
sharp bands that are chemically specific. Fluorescence is a competing phenomenon and, 
depending on the sample, can overwhelm the Raman signal, for both bulk spectroscopy and 
imaging implementations. 

Raman chemical imaging requires little or no sample preparation. However, physical 
sample sectioning may be used to expose the surface of interest, with care taken to obtain a 
surface that is as flat as possible. The conditions required for a particular measurement 
dictate the level of invasiveness of the technique, and samples that are sensitive to high 
power laser radiation may be damaged during analysis. It is relatively insensitive to the 
presence of water in the sample and is therefore useful for imaging samples that contain 
water such as biological material. 

Fluorescence imaging (visible and NIR) 

This emission microspectroscopy mode is the most sensitive in both visible and FT-NIR 
microspectroscopy, and has therefore numerous biomedical, biotechnological and 
agricultural applications. There are several powerful, highly specific and sensitive 
fluorescence techniques that are currently in use, or still being developed; among the 
former are FLIM, FRAP, FRET and FLIM-FRET; among the latter are NIR fluorescence and 
probe-sensitivity enhanced NIR fluorescence microspectroscopy and nanospectroscopy 
techniques (see "Further reading" section). 

Sampling and samples 

The value of imaging lies in the ability to resolve spatial heterogeneities in solid-state or 
gel/gel-like samples. Imaging a liquid or even a suspension has limited use as constant 
sample motion serves to average spatial information, unless ultra-fast recording techniques 
are employed as in fluorescence correlation microspectroscopy or FLIM obsevations where 
a single molecule may be monitored at extremely high (photon) detection speed. 
High-throughput experiments (such as imaging multi-well plates) of liquid samples can 
however provide valuable information. In this case, the parallel acquisition of thousands of 
spectra can be used to compare differences between samples, rather than the more 
common implementation of exploring spatial heterogeneity within a single sample. 

Similarly, there is no benefit in imaging a truly homogeneous sample, as a single point 
spectrometer will generate the same spectral information. Of course the definition of 
homogeneity is dependent on the spatial resolution of the imaging system employed. For 
MIR imaging, where wavelengths span from 3-10 micrometres, objects on the order of 5 
micrometres may theoretically be resolved. The sampled areas are limited by current 
experimental implementations because illumination is provided by the interferometer. 
Raman imaging may be able to resolve particles less than 1 micrometre in size, but the 
sample area that can be illuminated is severely limited. With Raman imaging, it is 
considered impractical to image large areas and, consequently, large samples. FT-NIR 
chemical/hyperspectral imaging usually resolves only larger objects (>10 micrometres), 
and is better suited for large samples because illumination sources are readily available. 
However, FT-NIR microspectroscopy was recently reported to be capable of about 1.2 
micron (micrometer) resolution in biological samples^ 25 ^ Furthermore, two-photon 
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excitation FCS experiments were reported to have attained 15 nanometer resolution on 
biomembrane thin films with a special coincidence photon-counting setup. 

Detection limit 

The concept of the detection limit for chemical imaging is quite different than for bulk 
spectroscopy, as it depends on the sample itself. Because a bulk spectrum represents an 
average of the materials present, the spectral signatures of trace components are simply 
overwhelmed by dilution. In imaging however, each pixel has a corresponding spectrum. If 
the physical size of the trace contaminant is on the order of the pixel size imaged on the 
sample, its spectral signature will likely be detectable. If however, the trace component is 
dispersed homogeneously (relative to pixel image size) throughout a sample, it will not be 
detectable. Therefore, detection limits of chemical imaging techniques are strongly 
influenced by particle size, the chemical and spatial heterogeneity of the sample, and the 
spatial resolution of the image. 

Data analysis 

Data analysis methods for chemical imaging data sets typically employ mathematical 
algorithms common to single point spectroscopy or to image analysis. The reasoning is that 
the spectrum acquired by each detector is equivalent to a single point spectrum; therefore 
pre-processing, chemometrics and pattern recognition techniques are utilized with the 
similar goal to separate chemical and physical effects and perform a qualitative or 
quantitative characterization of individual sample components. In the spatial dimension, 
each chemical image is equivalent to a digital image and standard image analysis and 
robust statistical analysis can be used for feature extraction. 

See also 

• Multispectral image 

• Microspectroscopy 

• Imaging spectroscopy 
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Version 1.2, November 2002 

Copyright (C) 2000,2001,2002 Free Software Foundation, Inc. 
51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 
Everyone is permitted to copy and distribute verbatim copies 
of this license document, but changing it is not allowed. 

0. PREAMBLE 

The purpose of this License is to make a manual, textbook, or other functional and useful document "free" in the sense of freedom: to assure everyone 
the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License 
preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others. 
This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the 
GNU General Public License, which is a copyleft license designed for free software. 

We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should 
come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any 
textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose 
is instruction or reference. 

1. APPLICABILITY AND DEFINITIONS 

This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under 
the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated 
herein. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you". You accept the 
license if you copy, modify or distribute the work in a way reguiring permission under copyright law. 

A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or 
translated into another language. 

A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or 
authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. 
(Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter 
of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them. 
The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the 
Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. 
The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections then there are none. 

The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document 
is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words. 

A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, 
that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for 
drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats 
suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to 
thwart or discourage subseguent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount of 
text. A copy that is not "Transparent" is called "Opague". 

Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using 
a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for human modification. Examples of transparent image 
formats include PNG, XCF and JPG. Opague formats include proprietary formats that can be read and edited only by proprietary word processors, SGML 
or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by some 
word processors for output purposes only. 

The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License 
reguires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent 
appearance of the work's title, preceding the beginning of the body of the text. 

A section "Entitled XYZ" means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that 
translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as "Acknowledgements", "Dedications", 
"Endorsements", or "History".) To "Preserve the Title" of such a section when you modify the Document means that it remains a section "Entitled XYZ" 
according to this definition. 

The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers 
are considered to be included by reference in this License, but only as regards disclaiming warranties: any other implication that these Warranty 
Disclaimers may have is void and has no effect on the meaning of this License. 

2. VERBATIM COPYING 

You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, 
and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to 
those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. 
However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in 
section 3. 

You may also lend copies, under the same conditions stated above, and you may publicly display copies. 

3. COPYING IN QUANTITY 

If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document's 
license notice reguires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the 
front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front 
cover must present the full title with all words of the title egually prominent and visible. You may add other material on the covers in addition. Copying 
with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in 
other respects. 

If the reguired texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, 
and continue the rest onto adjacent pages. 

If you publish or distribute Opague copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy 
along with each Opague copy, or state in or with each Opague copy a computer-network location from which the general network-using public has 
access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material. If you use the latter 
option, you must take reasonably prudent steps, when you begin distribution of Opague copies in guantity, to ensure that this Transparent copy will 
remain thus accessible at the stated location until at least one year after the last time you distribute an Opague copy (directly or through your agents or 
retailers) of that edition to the public. 

It is reguested, but not reguired, that you contact the authors of the Document well before redistributing any large number of copies, to give them a 
chance to provide you with an updated version of the Document. 

4. MODIFICATIONS 

You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified 
Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the 
Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version: 

1. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there 
were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version 
gives permission. 

2. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together 
with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five), unless they release you from this 
reguirement. 

3. State on the Title page the name of the publisher of the Modified Version, as the publisher. 

4. Preserve all the copyright notices of the Document. 

5. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices. 
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6. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this 
License, in the form shown in the Addendum below. 

7. Preserve in that license notice the full lists of Invariant Sections and reguired Cover Texts given in the Document's license notice. 

8. Include an unaltered copy of this License. 

9. Preserve the section Entitled "History", Preserve its Title, and add to it an item stating at least the title, year, new authors, and publisher of the 
Modified Version as given on the Title Page. If there is no section Entitled "History" in the Document, create one stating the title, year, authors, and 
publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence. 

10. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network 
locations given in the Document for previous versions it was based on. These may be placed in the "History" section. You may omit a network 
location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives 
permission. 

11. For any section Entitled "Acknowledgements" or "Dedications", Preserve the Title of the section, and preserve in the section all the substance and 
tone of each of the contributor acknowledgements and/or dedications given therein. 

12. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the eguivalent are not considered 
part of the section titles. 

13. Delete any section Entitled "Endorsements". Such a section may not be included in the Modified Version. 

14. Do not retitle any existing section to be Entitled "Endorsements" or to conflict in title with any Invariant Section. 

15. Preserve any Warranty Disclaimers. 

If the Modified Version includes new front-matter sections or appendices that gualify as Secondary Sections and contain no material copied from the 
Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the 
Modified Version's license notice. These titles must be distinct from any other section titles. 

You may add a section Entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties-for example, 
statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard. 

You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover 
Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) 
any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity 
you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the 
old one. 

The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply 
endorsement of any Modified Version. 

5. COMBINING DOCUMENTS 

You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, 
provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant 
Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers. 

The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are 
multiple Invariant Sections with the same name but different contents, make the title of each such section unigue by adding at the end of it, in 
parentheses, the name of the original author or publisher of that section if known, or else a unigue number. Make the same adjustment to the section 
titles in the list of Invariant Sections in the license notice of the combined work. 

In the combination, you must combine any sections Entitled "History" in the various original documents, forming one section Entitled "History"; likewise 
combine any sections Entitled "Acknowledgements", and any sections Entitled "Dedications". You must delete all sections Entitled "Endorsements." 

6. COLLECTIONS OF DOCUMENTS 

You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this 
License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim 
copying of each of the documents in all other respects. 

You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into 
the extracted document, and follow this License in all other respects regarding verbatim copying of that document. 

7. AGGREGATION WITH INDEPENDENT WORKS 

A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution 
medium, is called an "aggregate" if the copyright resulting from the compilation is not used to limit the legal rights of the compilation's users beyond 
what the individual works permit. When the Document is included in an aggregate, this License does not apply to the other works in the aggregate which 
are not themselves derivative works of the Document. 

If the Cover Text reguirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire 
aggregate, the Document's Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic eguivalent of covers 
if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate. 

8. TRANSLATION 

Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant 
Sections with translations reguires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in 
addition to the original versions of these Invariant Sections. You may include a translation of this License, and all the license notices in the Document, 
and any Warranty Disclaimers, provided that you also include the original English version of this License and the original versions of those notices and 
disclaimers. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version will 
prevail. 

If a section in the Document is Entitled "Acknowledgements", "Dedications", or "History", the reguirement (section 4) to Preserve its Title (section 1) will 
typically reguire changing the actual title. 

9. TERMINATION 

You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, 
sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However, parties who have received 
copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 

10. FUTURE REVISIONS OF THIS LICENSE 

The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be 
similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/. 
Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or 
any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has 
been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any 
version ever published (not as a draft) by the Free Software Foundation. 

How to use this License for your documents 

To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices 
just after the title page: 

Copyright (c) YEAR YOUR NAME. 

Permission is granted to copy, distribute and/or modify this document 

under the terms of the GNU Free Documentation License, Version 1.2 

or any later version published by the Free Software Foundation; 

with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. 

A copy of the license is included in the section entitled "GNU 

Free Documentation License". 
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the "with... Texts." line with this: 

with the Invariant Sections being LIST THEIR TITLES, with the 

Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST. 
If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives to suit the situation. 
If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software 
license, such as the GNU General Public License, to permit their use in free software. 



